EDUCATIONAL AND PSYCHOLOGICAL 
MEASUREMENT 





Volume I JULY, 1941 





A New PERFORMANCE TEST FOR YOUNG DEAF CHILDREN 
Marshall 8. Hiskey 


PERFORMANCE TESTING IN PUBLIC PERSONNEL SELECTION 


Sidney W.. Koran 


SoME DaTA ON THE KUDER PREFERENCE RECORD 
Arthur E. Tranler and William C. McCall 


THE RELIABILITY OF RATIO ScorEs 
Lee J. Cronbach 


GuIDING STUDENTS To BECOME SELF-GUIDING 
Joseph S. Kopas 


An ATTEMPT TO MEASURE SCIENTIFIC THINKING 
Max D. Engelhart and Hugh B. Lewis 
AN EVALUATION OF TECHNIQUES OF MEASURING VISUAL ACUITY 
AT THE COLLEGE LEVEL 
Frances Oralind Triggs and Karl E. Sandt 


THE CONCEPT OF SCATTER IN THE LIGHT OF MENTAL TEST 
THEORY 


Maurice Lorr and Ralph K. Meister 


MEASUREMENT ABSTRACTS 


MEASUREMENT News 








Copyright, 1941, by 
SCIENCE RESEARCH ASSOCIATES 


PRINTED IN THE UNITED STATES OF AMERICA 














A NEW PERFORMANCE TEST FOR YOUNG 
DEAF CHILDREN 


MARSHALL S. HISKEY* 
University of Nebraska 


Introduction 


HERE has long been a need for a measuring device 

which would give the teacher of the very young deaf 
child a valid indication of his learning level at the begin- 
ning of his educational career. One can find a consider- 
able number of more or less carefully worked out mental 
tests which have been used for deaf and hard-of-hearing 
individuals. However, the degree of help which such tests 
render the educator or clinician depends upon the num- 
ber and representativeness of the children upon whom 
they have been standardized, and also upon the reliability 
and amount of information about the children which the 
test makes available. Few tests have been standardized 
on deaf children or used with such children at the begin- 
ning of their school experience. 

Instruction, especially at the lower levels, although 
carried on as a group activity, actually, involves consider- 
able individualized work. This individualization is, in 
many instances, primarily a means of preparing for group 
instruction. Therefore, if classes are not composed of 
students of approximately the same level of ability, the 





1The writer wishes to acknowledge his indebtedness to Dr. D. A. Worcester 
and other staff members of the Department of Educational Psychology and Meas- 
urements of the University of Nebraska and to the administrations of the Iowa, 
—* Kansas, Missouri, Illinois, Indiana, and Ohio State Schools for the 
eaf. 


217 





EDUCATIONAL AND PSYCHOLOGICAL MEASUREMENT 


teacher must spend entirely too much of her time working 
with the slower pupils as individuals. In many instances 
this is done at the expense of the more capable students 
and often results in a great waste of time since it is diffi- 
cult to keep the young deaf child occupied constructively 
without the direct, and almost constant, guidance and 
supervision of the teacher. If supplementary measuring 
devices are valuable in making the school program more 
effective for the hearing child, then they should be even 
more valuable with a group who must start with the 
handicap of deafness. 


Difficulties involved in constructing a test for the 
young deaf and hard-of-hearing. In the selection of items 
for young deaf children, the special limitations of this 
group must be kept constantly in mind. The actual test- 
ing of deaf children presents problems which are unique. 
Practically every impression of the test materials gained 
by the deaf child must be through the sense of sight. All 
instructions must be given through pantomime. Because 
of the child’s complete lack of language experience, the 
test items must have an unusual intrinsic attractiveness. 
In addition to these problems one must devise a sufficient 
variety of items to sample adequately the abilities of indi- 
viduals whose range of experiences has been seriously 
restricted. 

To attempt to obtain a rating of the “word fluency” of 
the child who has been deaf since birth would be futile. 
Nor does it seem appropriate to include speed tests since 
it is very difficult to give to the young deaf child the con- 
cept of speed. 

- Based on the observations gained through testing the 
members of both groups, the writer is of the opinion that 
deaf subjects are more prone to “jump to conclusions” 
and to overestimate their abilities or the amount of mate- 
rial which they have grasped, than are hearing subjects. 
It is necessary to make them take their allotted time for 
viewing materials before they attempt a response. On the 


218 





PERFORMANCE TEST FOR DEAF CHILDREN 


other hand, the examiner must always be on the alert, lest 
through some slight change in facial expression he assist 
the subject in making his response. The deaf or hard-of- 
hearing child is continuously seeking visual clues and an 
“arched eyebrow” or the “flicker of an eyelash” may 
speak volumes to him. 

The writer has made no attempt to compare the intel- 
lectual development of deaf and hard-of-hearing children 
with that of hearing children. The deaf child’s training 
probably will never be identical with that of the hearing 
child. The writer is of the opinion that the question of 
primary importance is not, “‘How does the deaf child rank 
in comparison with the hearing child?”, but rather, 
“How does the deaf child rank in comparison with other 
deaf children of his chronological age?” 


Development and Standardization of the Scale 

Preliminary study of deaf and hard-of-hearing chil- 
dren in school. In order to obtain a more adequate under- 
standing of the group, the writer made an intensive study 
of deaf children as they actually went about their school 
work. 

For a period of more than four months the writer 
spent three days every two weeks with these pupils in a 
residence school for the deaf. Not only did he visit them 
at their class work but he lived with them at the school 
and associated with them on the playground, in the gym- 
nasium, and elsewhere. A complete record was made of 
the activities which took place in the classroom and also 
of those of an extra-curricular nature. This type of study 
yielded a multitude of suggestions which were of the 
utmost importance in the construction of the scale. 

The selection and construction of test items. Every 
item of the scale was considered in light of the following 
criteria: (1) Was the item similar to the task, or tasks, 
which the young deaf child did in school? (2) Was it the 
type of item which could be included in a non-verbal 
test? (3) Could the item be presented in such a way that 


219 











EDUCATIONAL AND PSYCHOLOGICAL MEASUREMENT 


directions could be given through simple pantomime? 
(4) Was it the type of item which experience had shown 
to yield high correlation with acceptable criteria of intel- 
ligence or learning ability? (5) Could the item be con- 
structed and presented in such a way that the child could 
give a definite response, thus making the scoring objective 
and easily done? (6) Would the item be appealing or 
attractive to the subject? (7) Could the item be scored 
without the score being based on time? (8) Did the diffi- 
culty of the item appear to be within the age range of the 
standardizing group? (9) Did the item seem likely to 
show a high discriminative capacity? 


In many instances, in order to meet all the above 
criteria, it was necessary to devise special methods of 
constructing or assembling the parts of an item. The pre- 
liminary scale was composed of 18 different types-of items 
with a total of 204 items. 


The use of the preliminary scale. This scale was given 
to seventy-three pupils of the lowa School for the Deaf, 
whose ages ranged from three years ten months to nine 
years eight months. Owing to the length of the scale, it 
was divided into two parts and half of the group was 
given Part A first and the other half of the group was 
given Part B first. The two parts were given not less 
than one day nor more than one week part. In several 
instances items were scored in detail, thus permitting a 
later rescoring on a different basis. 

After members of this tryout group were tested, an 
item analysis was made and curves of the percentage 
passing each successive chronological age were plotted. 
This was done for each of the 204 individual items of the 
scale. The steepness of these curves afforded a graphic 
indication of the validity of the items. The items which 
appeared to function the most satisfactorily and to most 
nearly approximate the criteria were retained. The cri- 
teria used were: (1) validity (based on the percentage 
passing from one age to the next); (2) ease of adminis- 


220 











PERFORMANCE TEST FOR DEAF CHILDREN 


tering; (3) ease and objectivity of scoring; (4) attractive- 
ness or interest to the subject; (5) variety; and (6) time 
of administering. When the sifting process was com- 
pleted, 11 types of tests were retained, including a total 
of 124 individual items. 


The test items. A brief description of the items may 


make later discussions more meaningful. The types are 
as follows: 


1. 


Memory for Colored Objects—Two sets of eight colored sticks 
each, one set for the examiner and one set for the subject. The 
examiner presents from one to five of the sticks from his group and 
then removes them and the ‘subject must select the corresponding 
sticks from his set from memory. 

Bead Stringing—At the lower levels scoring is based on the number 
of beads strung during a two-minute period. The intermediate 
level demands the correct copying of bead patterns, while at the 
upper level the subject is rated on his ability to reproduce patterns 
from memory. 


. Pictorial Associations—This includes 12 series of pictures. In 


each series two pictures are mounted side by side and a recess is 
left for the insertion of the third picture which is associated with the 
first two. This third picture must be selected from a group of 
four unmounted pictures. (There are four unmounted pictures for 
each series. ) 

Block Patterns—A set of eight drawings of block patterns and 16 
blocks. The patterns are arranged in order of difficulty and the 
subject must construct the pattern shown in the drawing. 

Memory for Digits—Two sets of nine numbers each. The exam- 
iner presents a number series and the subject must reproduce it from 
memory. 

Completion of Drawings—A series of 15 pictures, each with a part 
missing. The subject must draw the missing part and thus com- 
plete the picture. 

Pictorial Identification—Six series of mounted pictures. Each series 
has five pictures of a similar nature which are mounted side by 
side. Four individual pictures which are duplicates of the mounted 
pictures must be correctly identified by matching them with the 
corresponding mounted picture. 

Paper Folding—Six-inch squares of paper which must be folded 
by the subject to duplicate (seven) patterns. 


221 








EDUCATIONAL AND PSYCHOLOGICAL MEASUREMENT 


9. Visual Attention Span—Several series of pictures (varying from 
one to six pictures each) and 15 individual pictures. The subject 
is shown a picture series and he must use the individual pictures to 
reproduce the presented series from memory. 

10. Puzzle Blocks—Eight sets of variously shaped pieces of wood. Each 
set can be put together to form a block. 

11. Pictorial Analogies—Ten series of pictures with three pictures in 
each series mounted and four pictures to use as choices. The first, 
second, and third pictures of the analogy are mounted and a recess 
is left for the insertion of the fourth picture which completes the 
analogy. The subject must select the latter picture from among the 
four available choices. 


Use of the provisional scale. In addition to the work 
done with the pupils of the Iowa school, the test was 
administered in the state schools for the deaf in Nebraska, 
Kansas, Missouri, Illinois, Indiana, and Ohio, as well as 
to the members of the Lincoln, Nebraska, Day School. 
All students, except a few who were ill, who were under 
10 or who had had their tenth birthday within 15 days 
of the examination date were tested. The test was admin- 
istered to 466 individuals. The standardizing group is 
limited in numbers at the age of four and below, since 
most schools do not accept children until they are five or 
six years of age. 

Derivation of the final scale. To save time and to 
guarantee greater accuracy in the statistical data on which 
the final selection of items would be based, Hollerith 
techniques were used. By means of the Hollerith sorter 
and counter it was possible to determine quickly the num- 
ber of individuals who were successful on each item in 
successive ages throughout the range and thus to plot for 
each item the curves of percentage passing. Items were 
selected chiefly on the basis of discriminative ability, this 
judgment being based on the increase in percentage pass- 
ing from one age to the next. The items in each group 
were next arranged in order of difficulty, this order being 
based on the percentage of the total group passing each 
individual item. 





222 

















PERFORMANCE TEST FOR DEAF CHILDREN 


To develop the table of norms, curves were plotted 
showing for each age group the percentages making each 
possible total score for each group of items. The score 
necessary for passing each item at a certain age level was 
considered to be that score which was made by approxi- 
mately 70 per cent of the particular group. In all in- 
stances the percentages were plotted, the curves were 
smoothed, and the ends were extended to obtain what 
might be termed “projected norms” at the extremes. This 
smoothing of the curves of percentages gives a somewhat 
truer indication of the ability level of the four-year-old 
group. 

To determine who should compose the four-year-old 
group, or the five-year-old group, etc., it was decided to 
classify all individuals as four whose ages were between 
three years six months and four years five months and as 
five those who were between four years six months and 
five years five months, and so on. In no instance does the 
mean chronological age of the standardizing group devi- 
ate more than one month from the desired or true mean. 


The unit of measurement. Perhaps the most common 
method of interpreting scores on a scale such as this one 
is the familiar Binet type mental age. This is the method 
of using age norms and the amount of mental develop- 
ment in a year as the unit of measurement. Age norms 
are established for raw scores and are converted into, or 
interpreted as, mental ages. This age-type score, repre- 
senting the amount of development up to date, has much 
greater meaning to the layman than does the “standard 
score” or the “percentile score” and for that reason the 
age norm has been used in this scale. However, the term 
“mental age” has not been used because the M.A. would 
undoubtedly suggest a Binet Mental Age which in turn 
would suggest the corresponding M.A. of the hearing 
child and thus lead to false comparisons. For this reason 
and because of the fact that the test items have been 
selected, in many instances, because of their similarity to 


223 








EDUCATIONAL AND PSYCHOLOGICAL MEASUREMENT 


the abilities which the deaf child must exhibit in school, 
the term “Learning Age” is used instead. 

An L.A. of 5-0 simply means that, according to the 
results of this test, the child is able to do those tasks which 
the average deaf child of five years is able to do, or, that 
he should be able to solve problems with the same average 
efficiency as the average deaf five-year old. 

It is recommended that in the interpretation of test 
results, the learning age be used instead of the learning 
quotient (L.Q., derived by dividing the L.A. by the C.A.; 
similar to the I.Q.). Until more conclusive evidence 
regarding the respective influence of environment and 
heredity on the mental development of the child, resulting 
from more carefully controlled experiments, is produced, 
one must proceed cautiously to insure that he is not clos- 
ing the door of opportunity to any child. If there is a 
reasonable question as to whether the hearing child can 
be improved through a stimulating program of training, 
is it not likely that this question will assume even larger 
proportions in the case of the deaf child? 


Statistical Analysis of Test Data 


The accuracy of any test is dependent, not only upon 
test items employed, but also upon the number of indi- 
viduals examined, the representativeness of the group, the 
accuracy with which the test has been scored, the deriva- 
tion of accurate and meaningful norms, and various sta- 
tistical applications which are used for the .purpose of 
checking, or for interpretation. An additional and more 
detailed statistical treatment of the data will be made at 
some later date. Such topics as sex differences, effects of 
schooling, relation of score to degree. of hearing loss, 
resemblances of score to teacher judgment of ability, and 
the results of a factorial analysis of the test items, are 
among those so reserved. 

J Adequacy of the standardization. Perhaps the main 
criterion for the standardizing of any test is the selection 
of representative populations at each age. The method 


224 














PERFORMANCE TEST FOR DEAF CHILDREN 


employed in meeting this problem has been described 
briefly above, i.e., the testing of all available pupils 
(within the desired age range) in a rather widely scat- 
tered group of state schools for the deaf. However, to 
check the adequacy of the sampling of cases, a table of 
percentages of scores for each item was made which did 
not include the students of the Indiana school and the 
Ohio school. From this list of percentages, a table of 
norms was derived. These norms were then compared 
with the norms derived from the total group. In 89 
per cent of the cases, the norms were found to be iden- 
tically located and in the remaining 11 per cent of the 
cases they varied not more than six months. This would 
indicate that the sampling was probably sufficient for 
determining relatively stable norms. 


TABLE 1 
YEARS-IN-SCHOOL DISTRIBUTION BY AGES AND 


A COMPARISON OF THE MEAN C.A.’s AND THE MEAN L.A.’s 
FOR THE STANDARDIZING GROUP 








Years in School 








Age 0 1 2 3 4 5 6 Total MeanC.A. MeanL.A. 

+ 9 1 10 4-1 4-4.8 

5 33 9 42 5-0.7 5-1.8 

6 39 16 + 1 60 6-0.3 6-3.5 

7 22 42 15 + 1 84 7-0 7-2.3 

8 11 31 27 14 4 87 7-11.4 8-0.7 

9 6 29 43 31 5 3 117 8-11.7 9-2.9 
9-9 5 9 15 17 12 2 66 9-9 9-6.5 





TotaL 125 137 104 67 22 9 2 466 





Table | gives the number of individuals tested at each 
chronological age level. The small number of cases in the 
lowest age group means that the norms.will be less reli- 
able at the age of four. This table also shows the mean 
chronological age and the mean learning age for each of 
the age groups. In no instance does the mean chronolog- 
ical age differ more than one month from the desired 
chronological age. The mean learning ages likewise cor- 
respond closely to the mean chronological age. At each 
age level, except one, the mean L.A. is slightly higher 
than the mean C.A. It is felt that this is a desirable fea- 


225 





EDUCATIONAL AND PSYCHOLOGICAL MEASUREMENT 


BEAD STRINGING 


TABLE } 





PER CENT OF EACH AGE GROUP§ MA 


BLOCK BUILDING 








Total per 2 Min. 


Total Patterns 


Total Score 



































Age 7-8 9-10 11-12 13-14 I i i ivvVv4 2 3 ee oe 
4 100 90 40 30 36440 10 100 100 30 10 10 
5 100 91 79 55 60 29 12 100 100 81 40 17 2 
6 100 97 94 87 92 47 20 5 100 100. +98 $2 S2 2 ¢:2 
7 100 99 99 97 96 87 45 12 2 100 99 99 90 73 48 24 § 
8 100 100 100 100 100 95 67 32 9 100 100 99 94 85 69 44 II 
9 100 100 100 100 100 99 88 57 19 100 100 99 98 95 83 66 4 
9-9 100 100 100 100 100 100 89 70 24 100 100 100 100 98 95 80 § 
PICTORIAL ASSOCIATIONS PAPER FOLDING 
Total Score Total Score 
hee. 2223: SS Se OY 8... 8 a a. SSS 4 5 6 7 
4 100 100 100 90 60 20 20 100 100 100 40 10 
5 100 100 100 93 79 40 26 12 2 100 100 100 88 71 26 
6 100 100 100 97 90 80 62 43 20 8 2 100 100 100 97 90 72 If 
7 100 100 99 99 95 89 83 67 54 23 6 1 100 100 100 98 95 89 6 
8 100 100 100 99 99 97 94 85 67 51 18 6 100 100 100 99 99 96 @ 
9 100 100 100 100 99 99 97 96 82 73 39 12 100 100 100 100 100 99 & 
9-9 100 100 100 100 100 100 100 100 91 82 58 14 100 100 100 100 100 100 % 
MEMORY FOR COLORED OBJECTS MEMORY wal DIG. 
Total Score Part A Totals Not in Order =" 
Age 5-6 7-8 9 10 | ee ee & ee ee Le | me bf 3 4 5 B C D HB 
4 100 90 40 £20 10 100 80 10 10 
5 100 98 62 26 Mm 2 ¢ 100 88 45 64 10 10 Il) 50 
’ 20 38 Sb: 7) 42 27.435: .5:. 2 100 95 87 97 65 10 IP gg 
7 100 99 98 89 77 54 31 =«13 6 1 100 99 97 98 83 38 Iho . 
8 100 100 99 93 84 671 60 40 22 11 3 100 100 100 100 94 59 If 9g | 
9 100 100 100 97 91 85 73 56 43 27 § 100 100 100 99 94 79 318 9g 
9-9 100 100 100 100 100 95 80 62 48 29 17 100 100 100 100 98 83 Sihi09 | 





ture. If there were an inadequate sampling of subjects, 
it would likely be of the group with limited ability, since 
the mentally less advanced are less likely to have entered 
school at a reasonably early age than are those mentally 
advanced. The test ceiling is not high and this may be 
responsible for the fact that the group at the upper end 
of the range has a mean L.A. slightly below their mean 
C.A. In only two instances do the two means deviate by 
as much as three months—the greatest deviation being 3.8 
months at the four-year level. 


226 











E 


UP 


PERFORMANCE TEST FOR DEAF CHILDREN 


MAKING EACH SCORE ON EACH TYPE OF ITEM 


PICTORIAL IDENTIFICATION VISUAL ATTENTION SPAN 








Total Score Total Score 


1-2 3-4 5-6 7-8 9-10 11-12 13-14 15-16 17-18 19-20 21-22 23-24 1 2 a. 4° 5*- 6 





100 100 100 100 100 90 80 50 40 10 100 70 20 10 
100 100 100 100 98 95 93 79 60 33 10 5 100 73 29 2 
100 100 100 100 100 98 93 80 65 48 29 23 100 95 72 35 12 2 
100 100 100 100 99 99 98 96 94 85 75 60 100 99 90 57 21 3 
100 100 100 100 100 100 100 99 98 98 95 87 100 100 98 77 36 11 
100 100 100 100 100 99 99 98 98 98 98 93 100 100 98 85 51 26 


5100 100 100 100 100 100 100 100 100 100 100 98 100 100 100 88 70 32 





























3 PUZZLE BLOCKS PICTORIAL ANALOGIES 
Total Score Total Score 
fl 1 2 3 4 5 6 7 1 2 3 4+ S; er 2. wo 9 10 
90 30 10 10 10 10 
100 «95 55 12 100 76 71 43 14 5 2 
W100 =—98 80 52 12 100 100 95 $0'. $7 30 13° 2 2 2 
61100 §=—98 87 73 38 5 100 99 96 89 79 52 27 10 1 
68100 100 99 89 64 22 Z 100 100 100 98 87 63 43 28 2 
48100 100 100 97 77 54 14 100 100 100 98 94 91 76 56 26 7 
98100 100 100 100 89 62 21 100 100 100 100 98 95 83 64 30 8 
a DIGITS COMPLETION OF DRAWINGS 
irder In Order Total Score 
i= CC Dp E 1 a 3 4 5 6 id 8 9 10 11 12 13 = 14 
20 10 10 
1g 500 2 67 44 36 19 17 12 7 2 2: 2 
WE 88 33 3 98 93 87 7 70 55 38 25 20 10 7 2 
991 55 «(13 98 96 94 87 85 a. as 67 $2 43 30 19 7 





80 32 6 100 100 99 98 95 95 94 92 88 74 53 34 16 7 
83 53 24 100 100 100 100 99 99 97 96 88 86 79 71 47 16 
97 73 32 100 100 100 100 100 100 100 100 100 97 94 83 61 23 











Table 2 gives the percentages of subjects in each age 
group making each possible total score.for each type of 
test item. These per cents were the ones used in plotting 
the curves of per cents. These curves were smoothed and 
the score which revealed approximately 70 per cent of 
success was taken as the score of the average person of 
that chronological age. It was then entered in the table 
of norms under that learning age. The learning ages 
given below 4-0 and above 10-0 are the results of an exten- 
sion process and, as has been mentioned before, are not so 


227 











EDUCATIONAL AND PSYCHOLOGICAL MEASUREMENT 


reliable as are those within the age range of the subjects 
examined. 

Validity. The very methods by which the test items 
have been selected and retained are evidence of their 
validity. It will be recalled that the items were selected 
according to rather definite criteria and that after they 
had been given they were subjected to a rigorous item 
analysis. Thus the chief criteria for validity were: (1) 
selection—through critical analysis and adherence to cri- 
teria, and (2) increase in the percentage passing from 
one age to the next. In the present scale, it was impossible 
to determine validity through correlations with other test 
scores inasmuch as there is no existing test which would 
have been an acceptable criterion. In the absence of the 
needed criterion, correlations were computed between 
the score on the entire scale (the score on the entire scale 
is the median learning age of the learning ages obtained 
on the several parts of the scale) and the score on each 
group of items. The correlation setup is seemingly a 
spurious one since a part of the test has been corre- 
lated with the whole test which includes this part. As 
the score on the entire scale is the median score of the 
parts of the scale, however, each part has an approxi- 
mately equal share in producing this total or final score 
and this in turn lessens or eliminates the possibility of 


TABLE 3 
CORRELATIONS BETWEEN THE LEARNING AGE OBTAINED ON ONE 
SECTION OF THE TEST AND THE MEDIAN LEARNING AGE 
OBTAINED ON THE ENTIRE TEST 











Group I Group II 
(Age 4 to 7) (Age 8 to 10) 

1. Memory for Colored Objects.............. 804 -740 
eI UII oa g Galo wean eedea cars 812 729 
3. Pictorial Associations ..................... 643 693 
OS IE IEEE iio 5 sync a be a cis eee ass we 797 718 
Se SN TE ARRIID ain os cso ca Siece a aidan 755 773 
6. Completion of Drawings.................. -702 
7. Pictorial Identification .................... .780 

RAM NE occa sn sip a b.0.6 se oedse-susuies 843 

9. Visual Attention Span..................... 637 -629 
BO ee IID Sina 6 6 oo ss 5 ov  s'nip Sin a's also wae eve 734 
11. Pictorial Analogies ............ccseccsseces 742 




















PERFORMANCE TEST FOR DEAF CHILDREN 


the obtained correlations being spuriously high. Since 
the correlations between the learning age obtained on each 
group of items and the median learning age on the entire 
scale are within the range of from .629 to .843, they are 
evidence of high internal consistency and thus, perhaps, 
of high item validity. 


The abbreviated scale. To determine whether a de- 
pendable short scale could be assembled, the five types of 
items which showed the highest correlation with the me- 
dian learning age for the entire scale were selected to form. 
the abbreviated scale. Since some of the groups of items 
do not function over the entire age range, correlations 
were derived separately for two groups. Group I was 
composed of all members of the standardizing group who 
were seven years or under, and Group II, those who were 
from eight to 10 years of age. For Group I, correlations 
with the total scale were obtained for all groups of items 
except those which do not function at the lower levels, and 
for Group II, correlations with the total scale were ob- 
tained for all groups of items except those which do not 
function at the higher levels. The best booklets were 
rescored on the basis of these abbreviated scales and cor- 
relations were found between the median learning ages 
obtained from the abbreviated scales and the original 
scale. The correlation for Group I was .944 and for 
Group II .936. Thus, when time limitations make it nec- 
essary, the short forms may be used with a considerable 
degree of confidence. These abbreviated forms can be 
given in approximately 30 minutes. 


Although it is recommended that’ the learning age be 
used in preference to the learning quotient (L.Q. = 
L.A. / C.A.), in order to make the study more complete 
and significant the writer has made a study of the L.Q.’s 
of the standardizing group. Table 4 shows that the mean 
learning quotients derived from the standardizing group 
closely approximate the desired mean of 100. The great- 
est deviation is at age four and is probably due to the 


229 











EDUCATIONAL AND PSYCHOLOGICAL MEASUREMENT 


small number of cases and to the fact that they are a 
somewhat select group. 


TABLE 4 
THE MEAN LEARNING QUOTIENT, RANGE, STANDARD DEVIATION 


OF THE L.Q.’s; AND STANDARD ERROR OF THE MEAN FOR EACH 
AGE LEVEL OF THE STANDARDIZATION GROUP 











Mean Range of o o 

Age L.A. L.Q.’s L.Q. M 
+ 108.5 94-127 10.909 3.4500 
5 102.7 80-124 10.518 1.6228 
6 104.4 65-139 14.470 1.8681 
7 103.4 43-132 15.300 1.5912 
8 101.7 65-137 15.135 1.6226 
9 104.0 55-134 14.361 1.3277 
9-9 99.8 73-120 11.410 1.4040 





The mean L.Q. at each age level except the upper 
group (9-6 to 10-0) is slightly above 100. As has been 
mentioned before, this is probably a desirable feature. 
The lower mean of the upper group is apparently the re- 
sult of the limited test ceiling. The standard deviations at 
each age agree closely, except at the two extremes where 
the attenuating factors before mentioned have influenced 
them. In general, the standard deviation of the means is 
approximately 1.6 (disregarding the four-year group). 

Every effort has been made to make the test usable 
and yet have the mechanics as simple as possible. The 
record blank (Table 5) has been no exception and has 
been patterned after the one devised by Hildreth and 
Pintner. The record blank is in reality a table of norms 
and the various scores are checked on the blank and the 
median score is calculated. The items of the abbreviated 
scales also are indicated on the blank. 

The test items not only are attractive to young deaf 
children but also they have a rather high discriminative 
value. The scale is not difficult to administer or score and 
since it is weighted heavily with tasks similar to those 
which the deaf child must do in the early years of his edu- 
cational career, it should be extremely valuable for gain- 
ing a better understanding of the abilities of the younger 


230 














PERFORMANCE TEST FOR DEAF CHILDREN 


Se ees ee Me, Se eS 


OL 03 8 Sade JOZ J["IS pazBIAsIqqy) 
*L 03 ¢ 88Be IO} DVIS PaJBIAZIQqYVy, 





SAIDOTVNV+t 
TVINOLOId 





SMOOTA+ 
aTZZ0d 





I NVdS NOLLNALLV 
TVASIA 





£ 4 I ONICTOd+ 


wWadVd 





$Z (44 02 8T 
—£% “2. =6t.._ “G1 


91 
ST 


v1 ate O18 
I -Il -6 


NOILVOIMLLNACI« 
TVIAOLOId 





+1 


eT 


A IT 


or 


=t + 


SONIMVAUd AO 
NOLLATdNOO 





&t 


ras 3! 


or 


€ Z 1 SLISICG YOu+ 


AYOWAWN 





z T SNUALLVds 


MOOT 





(ai 


193 


ot 


+ £ SNOILVIOOSSV 


TVIWOLOId 





A 


Al 


III 


v1 
mt! 


ai 
-It 


ot 8 9 
~~ ~~ -5 


ONIONIULS s+ 
davad 





oT 


st 


+1 


&T 


ra 


6 


SLOG[AO GAAXOTOD++ 
wou AYOWAW 


8 9 + 
ae =f —£ 





9-IT O-IT 9-01 


0-01 9-6 


0-6 9-8 





0-8 


9-L O-L 


0-S 


o- 


O+> %9-F OF aDV ONINAVAT 








aLva 


YANINVXA vi 





aLVd HLUd 





adv ay 


TOOHSS 


(uaIp[iys Jeap Sunod soz paydepe Alje1vadsq) 


AGNLILMV ONINAVAT AO LSAL TWHAAA-NON 


word ANVTA GXOoAA 
§ aTaVL 


—_ 


~ 
N 








EDUCATIONAL AND PSYCHOLOGICAL MEASUREMENT 


deaf children. This is not intended to imply that the 
inexperienced person could give the test satisfactorily. 
The person who is unfamiliar with individual testing 
techniques would have considerable difficulty unless he 
underwent a period of training or practice with this scale. 
It is quite conceivable that the person who has had some 
experience in individual testing and who has some knowl- 
edge of deaf children could, after a period of training in 
which he gave six or eight practice tests, administer the 
scale quite satisfactorily. 

















PERFORMANCE TESTING IN PUBLIC 
PERSONNEL SELECTION 


PART I 


SIDNEY W. KORAN’* 


Employment Board, Pennsylvania Department of Public Assistance 


Introduction 


T IS an interesting fact that although the use of per- 
formance tests in the selection of public personnel 
enjoys not only the general endorsement of personnel tech- 
nicians but the enthusiastic and unsolicited support of the 
public as well, there is probably no other aspect of the 
examination process at present more completely neglected 
by the majority of merit system agencies. 

Probably all jurisdictions employ performance tests 
in the selection of typists and stenographers, and the gen- 
eral practice is to convert the ratings on these tests into 
quantitative terms capable of combination with scores 
achieved in other portions of the examination battery. 
Beyond that, however, the performance testing of most 
agencies seems seldom to go beyond the administration of 
qualifying tests to a sufficient number.of individuals at 
the top of certain registers to satisfy immediate certifica- 
tion requirements. Except for the case of tests of typing 





1The author desires to express his appreciation to the following individuals: 
Mrs. Ruth Glenn Pennell, and Mr. Robert Hall Craig, members of the Employ- 
ment Board; Miss Hilda P. Thompson, the Board’s Executive Director; Dr. C. 
H. Smeltzer, the Board’s Technical Consultant; Miss Kathleen Oyster, Traffic 
Representative of the Bell Telephone Company’s Harrisburg office; Mr. Andrew 
S. Hay, Service Supervisor of the IBM Harrisburg office; Mr. Bernard Gehring 
of the Multigraph Sales Agency in Harrisburg; and Miss Alice I. Thompson, 
of the Penn State Alumni Association. 


233 

















EDUCATIONAL AND PSYCHOLOGICAL MEASUREMENT 


and stenography, the technique of using the performance 
test as a major part of the test battery—that is, as a factor 
which may decidedly influence an examinee’s relative 
standing on the eligibility register—appears to have been 
almost completely ignored. 

There are, of course, various reasons why this situation 
exists. Associated with the technical difficulties inherent 
in the construction and administration of the performance 
test—difficulties which, incidentally, are frequently not 
nearly so “insurmountable” as they at first appear—may. 
be the factors of cost and already overburdened technical 
staffs. In addition, newly created agencies frequently face 
time deadlines which all but preclude their going beyond 
the commonly accepted minimum selection elements, 
namely: the use of minimum requirements, a written test, 
an evaluation of training and experience, and, for certain 
positions, an oral interview. In addition to these factors, 
however, and probably overshadowing them in effect, 
must be mentioned two others: general inertia and, prob- 
ably very closely related, an uncritical adherence to time- 
honored examination patterns considered satisfactory in 
selecting persons for jobs not requiring the possession of 
manual skills. 

Considerable progress has already been made in the de- 
velopment of performance tests for the selection of typists 
and stenographers. Since their use is so widespread, no 
further attention will be devoted to them in this discus- 
sion beyond pointing out that, despite their popularity, 
much necessary work still remains to be done toward their 
improvement, especially in the development of (1) suit- 
able standards of performance, (2) satisfactory scoring 
procedures, and (3) improved standardized techniques 
for administering the stenographic portion of the test.” 





2The Chicago Park District’s use of phonographic recordings and the novel 
experiments of the Buffalo Municipal Civil Service Commission and the Arizona 
Unemployment Compensation Merit System Council with radio broadcasting are 
examples of approaches to the problem of minimizing or eliminating the un- 
desirable effects of varying dictation speeds and other factors which characterize 
the use of numerous proctors. 


234 

















PERFORMANCE TESTING IN PUBLIC PERSONNEL 


Techniques have also been developed for measuring 
performance in other jobs such as chauffeur and in certain 
skilled trades. Many companies test applicants for chauf- 
feur positions and most state motor vehicle bureaus give 
qualifying driving tests to applicants for operator licenses. 
The latter are ordinarily quite informally conducted, but 
Viteles has described “‘a trade test of driving skill’*® which 
could quite readily be adapted to merit system use.* 


The New York City Civil Service Commission’s ex- 
cellent pioneer work in developing tests for such skilled 
trades positions as welder, machinist, electrician, lock- 
smith, lineman, and carpenter is quite well known.° 
Recently the State Technical Advisory Service of the 
Social Security Board began work on standardizing a 
performance test for Key Punch Operators. 


In general, however, the use of the performance test 
as a measuring instrument designed to serve not only as 
a qualifying hurdle, but also as an important factor in 
determining the examinee’s relative standing on the eli- 
gibility register has received much less attention than it 
deserves. The dearth of literature on the subject attests to 
this and probably contributes to the widely held feeling 
that performance tests capable of producing quantitative 
ratings are somehow exceptionally difficult to prepare 
and impractical to administer. 

It is hoped that this presentation will illustrate some 
of the possibilities by describing four actual tests used 
successfully by a medium-sized merit system agency, the 
Employment Board of the Pennsylvania Department of 
Public Assistance. Small jurisdictions may be able to use 
some of the material with few or no changes. Larger 
agencies, especially those with adequate technical staffs, 





3M. S. Viteles, Industrial Psychology (New York: W. W. Norton Company, 
1932), 221-24. 

4The Los Angeles City Civil Service Commission has developed tests of 
this kind for the positions of Auto Fireman, Ambulance Driver, and Motor 
Truck Driver. 

5Fifty-sixth Annual Report—1939 And First Half Of 1940, Civil Service 
Commission, City of New York. 


235 

















EDUCATIONAL AND PSYCHOLOGICAL MEASUREMENT 


will probably want to develop their own examinations. 
Even the latter, however, if their experience with this 
type of test has been limited, may find it helpful to con- 
sider another agency’s approach. 


Building the Performance Test 


The initial steps to be followed in constructing the 
performance test do not differ in any important respect 
from those basic to the construction of written tests. In 
both, the starting point is careful analysis of the job for 
which the test is to be designed. In constructing a per- 
formance test the analysis must include actual on-the-job 
observations of both the equipment and the persons doing 
the work. If the test constructor is not himself a compe- 
tent operator of the machine, it will not suffice for him to 
confine himself merely to study of the printed job speci- 
fications and technical literature and to conversations with 
experts and workers. Such an approach is inadequate 
even in the construction of written tests, where it is all too 
often the rule rather than the exception; as the only prep- 
aration to designing a performance test it can produce 
very unfortunate results. 

Every one of the foregoing steps—study of the job 
specifications as part of the agency’s classification plan, 
study of technical literature available on the equipment 
and on its operation, and conferences with skilled workers, 
supervisors, and acknowledged experts in the field—has 
its place in the procedure. That place is as a supplement 
to a first-hand acquaintance with the job itself. The pro- 
fessional test constructor must analyze the job sufficiently 
thoroughly to permit himself to identify the skills in- 
volved and to determine their relationship to one another 
and to the whole; and he must discover those individual 
differences which will provide him with essential clues to 
types of test items likely to prove valid in differentiating 
among various levels of performance ability. 


Here it may be worth pointing out that there is per- 


236 














PERFORMANCE TESTING IN PUBLIC PERSONNEL 


haps no other phase of the examination program in which 
the personnel technician is less likely to turn out a satis- 
factory job unless he consults with specialists who know 
the practical and technical aspects of the job to be tested. 
Both specialists—the personnel technician and the expert 
in the occupational field under consideration—bring to 
the task certain information and knowledge of techniques 
which need to be reconciled toward a common end, that 
of producing a valid measuring instrument capable of 
fulfilling the numerous practical considerations which 
public agencies cannot afford to forget. The test construc- 
tor will want to find an expert who knows the job and who 
is sufficiently progressive, adaptable, and interested in the 
problems of personnel selection to be cooperative and 
sympathetic. The length of time required to orient such 
a co-worker in the problems of testing will not be great, 
and the effort will pay big dividends in the form of a 
smooth working relationship, a valid measuring instru- 
ment, and a strong ally in the event of later criticism. 


It should be kept in mind that the performance test 
should: (1) be sufficiently long to include an adequate 
sample of the differentiating essentials of the job, (2) be 
as inexpensive and easy to administer as possible, (3) 
minimize possible differences in achievement resulting 
from lack of immediate familiarity with the particular 
model of equipment on which it is given, (4) appear 
sufficiently practical and comprehensive to create a favor- 
able impression among those who do not qualify as well 
as among those who do, (5) be capablé of uniform admin- 
istration to all candidates, (6) be objectively scored and 
produce quantitative ratings. 


Setting the Passing Point 


Since one of the functions of the performance test is 
to eliminate candidates who do not demonstrate adequate 
ability to operate the equipment, passing points must be 


237 








“et 





EDUCATIONAL AND PSYCHOLOGICAL MEASUREMENT 


established with considerable care. Fortunately, this 
problem can usually be approached much more directly 
when performance tests are involved than it can with 
written tests. Practical considerations ordinarily make it 
impossible for most jurisdictions to employ standardized 
written tests or to develop satisfactory norms for the tests 
they construct. The usual practice, therefore, in agencies 
not bound by restrictive “70 per cent passing” legislation 
is to permit such factors as the following to influence the 
location of written test passing points: the number of 
examinees, the number of openings likely to occur during 
the life of the register, the general caliber of the compet- 
ing group, whether or not the examination battery in- 
cludes such other hurdles as a performance test or an oral 
interview, and previous certification experiences concern- 
ing the ratio of refusals to acceptances. 


In establishing the qualifying point for a performance 
test, on the other hand, the principal criterion must be an 
affirmative answer to the question, “Can the examinee per- 
form the task well enough to meet the employer’s mini- 
mum standards?” Production records are ordinarily 
available on types of work sufficiently similar to those 
sampled by the test to serve as the basis for setting the 
elimination point. Where such records are not available 
or are not in usable form, they can generally be obtained 
quite easily, and profitably, too, during the test tech- 
nician’s study of the job. 


While such data should usually serve as the principal 
basis for establishing the qualifying grade, they ought not 
to be the sole consideration. Some of the other factors, for 
example, which it is frequently important to note are: (1) 
the level of ability of the agency’s employees as compared 
with that of other persons doing the same kind of work; 
(2) the immediate, and possible future, condition of the 
labor market in the specific field under consideration and 
in related fields; and (3) the possible effects of nervous- 
ness, atypicality of the test situation, and other factors 


238 














PERFORMANCE TESTING IN PUBLIC PERSONNEL 


ity of the examination. 


Four Performance Tests 


chine Operator. 


the following facts in mind: 


rating. 


requirements. 





form the work. 


likely to be present and to lower the validity and reliabil- 


The remainder of this presentation will be devoted to 
describing, with a minimum of discussion, some of the 
forms and procedures developed in connection with per- 
D formance tests for the following four kinds of jobs: Tele- 
phone Operator, Graphotype-Addressograph Operator, 
Tabulating Machine Operator, and Duplicating Ma- 


In considering the material, the reader should keep 


1. Each of the tests was designed for examinees who had “passed” a 
previous hurdle, that of scoring above the 60th percentile in the 
combination of their written test score and training-experience 


2. The law under which the examining agency operates forbids the 
establishment of any kind of minimum training and experience 


3. The operating agency prefers to make no provision for training new 
employees for these positions and requires that a new appointee 
be able to perform the duties of the position almost immediately. 

4. Each of the tests was designed to serve as a qualifying examination 
capable of weeding out individuals lacking in sufficient operating 

‘ ability to produce satisfactory work, and as a measuring instrument 

capable of producing quantitative ratings of relative ability to per- 


5. The validity of none of the tests has been determined through the 
use of statistical procedures. For the present, their only claim to 


validity is based on the fact that (1) experts in the fields covered 
by the tests state that they measure what they purport to measure, 


are now under way. 


239 





and (2) employees certified from registers established on the basis 
of these tests have proved more uniformly satisfactory to their 
employers than those certified from eligibility lists set up from 
examination batteries which did not include performance tests.® 


8Srudies of reliability and of the relation between performance test scores 
and service ratings, written test scores, training scores, and experience scores 








EDUCATIONAL AND PSYCHOLOGICAL MEASUREMENT 


The Test for Telephone Operators 

This test was designed to be administered to examinees 
in the 20 counties of the state in which vacancies existed 
for the position of telephone operator in either the main, 
or a regional, office of a local Board of Assistance. The 
counties were unusually widely scattered geographically. 
The dial system was in use in 60 per cent of the 20 coun- 
ties and the manual system in the remainder, but the PBX 
boards were all of the cord-type. The switchboards in 90 
per cent of the offices were connected to four or more 
trunk lines; the largest two had 12 and 15 trunk lines 
respectively. The number of extension stations varied 
from 11 to 66, but 60 per cent had 20 or more. 

The preliminary survey of the physical facilities avail- 
able in each of the 20 counties for which eligibility regis- 
ters were to be established was accomplished by means of 
a letter in which a questionnaire containing the following 
questions was enclosed : 

. Is your PBX switchboard of the cord type? 

How many city trunk lines do you have? 

. Is a dial part of the equipment of your switchboard ? 

. Is the entire city in which your office is located equipped with dial 
telephones ? 

. How many extension stations do you have? 

6. Is there an office near your switchboard in which there are two 
separate extension lines (not two extensions of the same line) ? 

7. If so, is the office in which these two lines are located within hear- 
ing distance of the ringer of a third line? 

8. Would you be willing to permit us to use your switchboard for 
the performance test if the test is scheduled on a Saturday afternoon 
or on a week day evening when there are few or no business calls 
likely to interfere? 

Twelve examination centers were established. The 
factors which dictated their selection were: (1) the type 
of equipment available in each county for which a regis- 
ter was to be established, (2) the relative proximity of 
counties with similar‘equipment, (3) the distance each 
examinee would be required to travel, and (4) the cost of 
administration. 


PFWN = 


an 


240 

















PERFORMANCE TESTING IN PUBLIC PERSONNEL 


The test itself consisted of a series of 13 operating 
situations designed to determine the examinee’s ability to 
service incoming, outgoing, and extension calls and trans- 
fers of incoming and outgoing calls, all under as nearly 
normal operating conditions as were practically possible 
to achieve. The minimum equipment necessary for the 
administration of the test was a cord-type switchboard 
having four trunk lines and five extensions. Two forms of 
the test were required: one, Form D, for operators of 
PBX installations in communities where the dial system 
was in use, the other, Form M, for manually operated 
installations. 


The test was administered by two persons (designated, 
in the test, as Mr. Albert and Mr. Brown), one of whom 
was required to be well acquainted with the procedure 
and to have had some practice in its administration. The 
two examiners used separate extension telephones but 
were situated within earshot of each other and within 
hearing distance of a third extension telephone. 


The administration, recording, and scoring of the 
test were facilitated by the development of a combined 
“cue sheet” and rating form designed to serve the four- 
fold function of (1) indicating the sequence of opera- 
tions so that each examiner would know what his task 
was at every stage of the test, (2) listing the phrases to 
be repeated verbatim by both examiner and examinee, 
(3) enumerating the items on which the examinee was to 
be rated, and (4) providing spaces for the examiners’ rat- 
ings and comments. Each examiner was provided with a 
copy of this form and, as Mr. Albert or Mr. Brown, was 
required to originate, maintain, and terminate the calls 
assigned to him and to rate the examinee on each phase of 
every call coming to his attention. 


As an aid in orienting the examinee to the test situa- 
tion she’ was provided with an Instruction Sheet (Ex- 





7TThe feminine pronoun is used because all examinees for the telephone 
operator performance test were women. 


241 





BP eae 0%, 





EDUCATIONAL AND PSYCHOLOGICAL MEASUREMENT 


hibit A) which set forth the general nature of the test she 
was about to take and listed a few simple instructions such 
as any experienced operator would need to be given on 
starting a new job. When the examinee had been per- 
mitted ample time to read the Instructions, she was as- 
signed to the switchboard. Several minutes, if necessary, 
were then allowed to permit the examinee to familiarize 
herself with any aspects of the board which were strange 
to her and to note the location of the jacks and names 
mentioned in the Instructions. When the examinee was 
ready to begin, the receptionist told her to ring Mr. 
Brown’s extension and to read her identification number 
to him from her admittance slip. This operation, as well 
as the first call placed by the examiner, was intended to 
help “break the ice” and did not enter into the determina- 
tion of the examinee’s score. 


Exhibit B is a reproduction of the test administered 
to examinees required to operate switchboards in dial- 
equipped communities. While the calls comprising the 
manual form of the test were similar in number and com- 
plexity to those included in the dial form, different cue 
and rating sheets were required because several of the 
operations (and, consequently, the points to be rated) 
were not the same for both systems. 


Some idea of the variety of realistic operating situa- 
tions existing during the administration of the test—de- 
spite the fact that all of the calls were originated by only 
two persons—may be gathered from an examination of 
some of the calls the operator is required to handle. In 
call No. 4 (see Exhibit B), the operator connects Mr. 
Albert’s extension to a city line. A few moments later, the 
operator is telling the person whose call has come in ona 
trunk line that Mr. Carson’s extension is busy (call No. 
5). To maintain the connections required at this stage of 
the test the operator had to put up seven cords. In the 
four calls immediately following, the operator was re- 
quired to perform these tasks: 


242 























PERFORMANCE TESTING IN PUBLIC PERSONNEL 


Call No. 6 
answer Mr. Albert’s extension; 
transfer the incoming call from Mr. Carson’s extension to Mr. 
Albert’s; 
take down the connection from one of the trunk lines and from 
Mr. Albert’s and Mr. Brown’s extensions. 
Call No. 7 
answer Mr. Brown’s extension; 
connect Mr. Brown’s extension to a city line so that Mr. Brown 
may dial his number through the central exchange. 
Call No. 8 
answer an incoming call; 
inform the person calling that Mr. Brown’s line is busy; 
hold the incoming call until Mr. Brown’s line is no longer busy. 
Call No. 9 
answer Mr. Albert’s extension; 
transfer the outgoing call from Mr. Brown’s extension to Mr. 
Albert’s extension; 
take down the connections from one of the trunk lines and 
from Mr. Albert’s and Mr. Brown’s extensions. 


The scoring procedure was designed (1) to permit 
the immediate elimination of candidates whose perform- 
ance fell below certain established minimum standards, 
and (2) to produce quantitative ratings reflecting the 
relative operating ability of the examinees who satisfied 
these minimum standards. Because both the level of diffi- 
culty of the duties and the relative ability of candidates 
to perform the duties varied in direct relation to the size 
of the county in which the jobs occurred, minimum stand- 
ards (based on the 12 calls comprising the test) were set 
on a class-county basis as follows: 


Class II Counties 10 completed calls 
Class III Counties 9 completed calls 
Class IV Counties 8 completed calls 


For the purpose of applying the minimum require- 
ments represented by these criteria, a call was considered 
“completed” if the operation or operations essential to 
recognition of that particular call were carried out suffi- 


243 








Tyee 





EDUCATIONAL AND PSYCHOLOGICAL MEASUREMENT 


ciently well to receive credit. Thus, an Extension to 
Trunk call was considered to have been completed if the 
examiner had granted credit for the ringing signal, a 
Trunk to Busy Extension call if credit had been granted 
for the busy report, Transferring an Incoming Call if the 
connection was maintained, etc. 


The actual steps followed in scoring the test are 
enumerated in Exhibit C, which is a reproduction of the 
instructions furnished the scorers. Two copies of the 
scoring form (referred to in Exhibit C as EB-695) are 
reproduced as Exhibits D and E. The former is the rec- 
ord of an examinee who did not complete a sufficient num- 
ber of calls to qualify in her county (Class II). The latter 
represents the rating of an examinee in a Class III County 
who completed considerably more than the minimum re- 
quired in her county. 


The use of the schedule of credits shown in Exhibit F 
made it possible to convert the approximately 75 sub-op- 
erations comprising the test into quantitative ratings.® The 
maximum attainable score for the test designed for opera- 
tors of dial equipment was 143, for operators of manual 
equipment, 123. To facilitate the scoring, keys were con- 
structed which turned the task into a routine operation 
easily performed by clerks experienced in scoring objec- 
tive tests. 





Part II of this article will appear in the October issue of 
EDUCATIONAL AND PsyCHOLOGICAL MEASUREMENT 





8The correlation between the number of calls completed and the score 
derived by applying the schedule of credits shown in Exhibit F is naturally 
quite high. In a test comprising 20 or 25 calls it would probably be unnecessary 
to go to the added trouble of weighting and scoring each part of a call. How- 
ever, several considerations suggested the desirability of doing so in the par- 
ticular test described. 


244 














PERFORMANCE TESTING IN PUBLIC PERSONNEL 


Exhibit A 
COMMONWEALTH OF PENNSYLVANIA 
EMPLOYMENT BOARD 
of the 
DEPARTMENT OF PUBLIC ASSISTANCE 
Harrisburg 
PERFORMANCE TEST FOR TELEPHONE OPERATORS 
Series 1000 
August 1940 
INSTRUCTIONS TO EXAMINEES 
Important: Failure to follow instructions may 
result in disqualification from the examination. 

The examination you are about to take has been designed to test 
your ability to perform some of the tasks ordinarily required of a tele- 
phone operator in the Department of Public Assistance. 

When your turn arrives, you will be assigned to a PBX cord-type 
switchboard. The designation strips on this switchboard will indicate 
the location of four or more trunk lines and the following extension 
stations: 

Mr. Albert Mr. Carson 
Mr. Brown Mr. Drake 
Official 

You will be given a few minutes to familiarize yourself with any 
aspects of the switchboard which are strange to you and to note the 
location of each of the jacks indicated above. 

When the Proctor tells you to do so, ring Mr. Brown’s telephone. 
Mr. Brown will answer and ask you to repeat your Identification Num- 
ber—the number which appears on your Admittance Slip. 

The examination, which consists of making various combinations of 
simple connections, will then begin. The first connection you will be 
required to make will be a practice exercise on which you will not be 
graded. 

When answering calls or acknowledging orders, the following 
phrases must be used: 

Answering incoming calls—‘‘Public Assistance.” 
Answering extension calls—“Yes, please ?” 
Acknowledging orders—‘“Thank you.” 

Note: On incoming calls, if the Calling Party requests information 
regarding the Department of Public Assistance or asks to talk to anyone 
besides the four persons whose names are shown on the designation strip, 
the call must be connected to the extension marked “Official.” 


245 





xe 


aes 


De gine 280 


PARR pein 


if 
fi 


EDUCATIONAL AND PSYCHOLOGICAL MEASUREMENT 


Exhibit B 
COMMONWEALTH OF PENNSYLVANIA 
EMPLOYMENT BOARD 
of the 
DEPARTMENT OF PUBLIC ASSISTANCE 
Harrisburg 














| Center 


Identification No. 
Date 














Mr. Albert [1] 
Mr. Brown [] Form D 
PERFORMANCE TEST FOR TELEPHONE OPERATORS 


Serres 1000 


1. EXTENSION TO EXTENSION (Practice Exercise) 





Mr. Albert lifts receiver 



























































Operator answers Promptness ( ) “Yes, please?” (_ ) 
Mr. Albert asks for Mr. Brown 
“Mr. Brown, please.” 
Operator acknowledges “Thank you.” ( ) 
Operator rings Mr. Brown Brown’s phone rings (_) 
Mr. Brown answers ‘Mr. Brown speaking.” Connection completed ( ) 
Connection maintained ( ) 
Mr. Albert and Mr. Brown hang up after a 
few seconds 
2. EXTENSION TO TRUNK 

Mr. Brown lifts receiver 
Operator answers Promptness ( ) ‘Yes, please?” ( ) 
Mr. Brown asks for city line 
“City line, please.” 
Operator acknowledges “Thank you.” (_) 
Operator connects Mr. Brown with trunk line |/Dial tone ( ) Promptness (_ ) 
Mr. Brown dials listed number Ringing signal ( ) 

Mr. Albert personally checks number of trunk 

line to which Mr. Brown has been connected ( ) 





3. TRUNK TO EXTENSION WHICH DOES NOT ANSWER 





Mr. Brown’s call comes in on trunk line 





Operator answers Promptness (_ ) 
P “Public Assistance.” (  ) 








Calling party (Mr. Brown) asks for Mr. Car- 
son “Mr. Carson, please.” 

















Operator acknowledges “Thank you.” (_) 
Operator rings Mr. Carson Carson’s phone rings (_ ) 
Promptness ( 
Operator gives ringing report. Mr. Brown Ringing report every 40 seconds ( ) 
tells Operator to continue ringing Appropriate phrase (_ ) 





Connection maintained until trans- 








246 











PERFORMANCE TESTING IN PUBLIC PERSONNEL 


Exhibit B (Continued) 


4. EXTENSION TO BUSY EXTENSION ; 


EXTENSION TO TRUNK 





Mr. Albert lifts receiver a few seconds after 
Mr. Carson’s telephone first rings 





Operator answers 


Promptness ( ) “Yes, please?” ( ) 





Mr. Albert asks for Mr. Brown 


“Mr. Brown, please.” 





Operator gives busy report 


Busy report ( ) Promptness ( ) 





Mr. Albert asks for city line 


“City line, please.” 








Operator acknowledges 


“Thank you.” ( ) 





Operator connects Mr. Albert with trunk line 


Dial tone ( ) Promptness ( ) 








Mr. Albert dials listed number 


Ringing signal (_ ) 





5. TRUNK TO BUSY EXTENSION 





Mr. Albert’s call comes in on trunk line 





Operator answers 





Calling party (Mr. Albert) asks for Mr. Car- 
son ‘Mr. Carson, please.” 


Promptness ( ) 
“Public Assistance.” (  ) 








Operator gives busy report and asks calling 
party to hold line 





Busy report ( ) Hold line ( ) 
Appropriate phrases (_ ) 








Calling party (Mr. Albert) hangs up 








6. TRANSFERRING INCOMING CALL 





Mr. Albert lifts receiver 





Operator answers 


Promptness (_ ) “Yes, please?”? ( ) 





Mr. Albert asks to have Mr. Carson’s call 
“Let me have Mr. Carson’s call, please.” 





Operator acknowledges 


Appropriate phrase (_ ) 





Operator transfers incoming call from Mr. 
Carson to Mr. Albert 








Mr. Albert and Mr. Brown hang up after a 
few seconds 


Appropriate phrase ( ) 
Transfer ( ) Promptness (_ ) 





Connection maintained ( ) 








7. EXTENSION TO TRUNK 





Mr. Brown lifts receiver 





Operator answers 


Promptness ( ) “Yes, please?” ( ) 





Mr. Brown asks for city line 
“City line, please.” 





Operator acknowledges 


“Thank you.” (_ ) 





Operator connects Mr. Brown with trunk line 


Dial tone ( ) Promptness ( ) 








Mr. Brown dials listed number 





Ringing signal (_ ) 











Mr. Albert asks Operator number of trunk line 
to which Mr. Brown has been connected Cos) 











247 










































~ 
es 


ee Tigi aeatiahs lie he FTE 


EDUCATIONAL AND PSYCHOLOGICAL MEASUREMENT 


Exhibit B (Continued) 
8. TRUNK TO BUSY EXTENSION 











Mr. Brown’s call comes in on trunk line 





Operator answers 


Promptness (_ ) 
“Public Assistance.” (  ) 





Calling party (Mr. Brown) asks for 
Mr. Brown “Mr. Brown, please.” 





Operator gives busy report and asks calling 


Busy report ( ) Hold line ( 





party (Mr. Brown) to hold line 
Calling party (Mr. Brown) holds line 





Appropriate phrases (_ ) 


) 











fer ( 


Connection maintained until trans- 





9. TRANSFERRING OUTGOING CALL 





Mr. Albert lifts receiver 





Operator answers 


Promptness ( ) “Yes, please?” (_) 





Mr. Albert asks to have call on Mr. Brown’s 


line transferred 
“Transfer the call on Mr. Brown’s line to me, please.” 





Operator acknowledges 


Appropriate phrase (_ ) 





Operator transfers outgoing call from Mr. 
Brown’s telephone to Mr. Albert’s telephone 


Appropriate phrase (_ ) 





Mr. Albert listens for open line 


Open line ( ) Promptness ( 


) 





Mr. Albert and Mr. Brown hang up 











10. EXTENSION TO EXTENSION 





Mr. Brown lifts receiver 





Operator answers 


Promptness ( ) “Yes, please?” ( 


) 





Mr. Brown asks for Mr. Carson 
“Mr. Carson, please.” 





Operator acknowledges 


“Thank you.” (_ ) 





Operator rings Mr. Carson 


Carson’s phone rings (_ ) 





Mr. Carson (Albert) answers 
“Mr. Carson speaking.” 


Connection completed (_ ) 





Connection maintained (_ ) 





Mr. Carson (Albert) and Mr. Brown hang up 
after a few seconds 








11. EXTENSION TO 


TRUNK 





Mr. Albert lifts receiver 





Operator answers 


Promptness ( ) “Yes, please?” ( 


) 





Mr. Albert asks for city line “City line, please.” 





Operator acknowledges 


“Thank you.” (_ ) 





Operator connects Mr. Albert with trunk line 


Dial tone ( ) Promptness ( 


) 





Mr. Albert dials all but last digit of listed 





number and holds line 








Mr. Brown asks Operator number of trunk line 
to which Mr. Albert has been connected 





248 

























PERFORMANCE TESTING IN PUBLIC PERSONNEL 


Exhibit B (Continued) 
12. TRANSFERRING OUTGOING CALL 


Mr. Brown lifts receiver 
Operator answers Promptness ( ) “Yes, please?” ( ) 
Mr. Brown asks to have call on Mr. Albert’s 


line transferred 
“Transfer the call on Mr. Albert’s line to me, please.”’ 




















Operator acknowledges Appropriate phrase (_) 
Operator transfers outgoing call from Mr. Al- 

bert’s telephone to Mr. Brown’s telephone Appropriate phrase (__) 
Mr. Brown listens for open line Open line ( ) Promptness ( ) 











Mr. Albert and Mr. Brown hang up 





VOICE 


To what extent is the Operator’s voice clear, distinct, pleasant? 1 2 3 4 
(1) Very unsatisfactory; (2) Unsatisfactory; (3) Satisfactory; (4) Very satisfactory. 


REMARKS: 





Examiner 





PEEL BAPTA PAE Cig Cites TREES 





EDUCATIONAL AND PSYCHOLOGICAL MEASUREMENT 


Exhibit C 


PROCEDURE FOR SCORING TELEPHONE OPERATOR PERFORMANCE TEST 


10. 


Series 1000 


Note: All scoring must be checked and must 
carry the initials of both scorer and checker. 


Check to see that there are two rating sheets and an Admittance 
Slip for each examinee and that the Identification Number on each 
is identical. 


Write the examinee’s Identification Number and County in the 
spaces provided on Form EB-695. (Use Form EB-696 for 
Form M.) 


Place a check mark on the rating sheet after the name of each call 
completed by the examinee. 


Place a check mark in the appropriate space on Form EB-695 for 
each completed ¢all and enter the total number of calls completed 
in the box provided. 


Eliminate from further consideration examinees who completed 
fewer than the minimum number of calls required for their County. 
(See attached schedule.) 


Score the rating sheets of examinees who completed a sufficient 
number of calls. Place the number of credits after each line and 


place the total number of credits for each call in a circle to the 
right of the last line of the call. (See attached schedule of credits. ) 


Transfer the number of credits for each call to the appropriate space 
on Form EB-695. 


Place a check mark or an “X”’ in each of the three spaces provided 
after the word “Trunks” on Form EB-695 and refer to the at- 
tached schedule for the number of credits to be entered in the space 
to the right. 


Place a check mark in the appropriate spaces after the word ‘“‘Voice”’ 
on Form EB-695 and refer to the attached schedule for the number 
of credits to be entered in the space to the right. 


Enter the total number of credits earned (Raw Score) in the box 
provided on Form EB-695. 


250 


— 




























Exhibit E~ 


Exhibit D 











=) 
a] 
v4 
Zz 
° 
72) 
mx 
59) 
Aa 
2 
4 
4 
=) 
ae 
Z 
oO 
4 
n 
a) 
& 
(23) 
Oo 
4 
a 
= 
(4 
fo) 
fy 
“4 
a) 
a 








i 


hol 
— 


[e) ae 











“hi 6 
-_— « 
— * 
“ef ~9 
a. 
“eT 7 
Ss € 


—s 





“et 2 4d ja 
b Tt 
“or oT 


$69~-aH 


100g MEY 


en ee JA ea or, fev T09TOA 
@eeesene 7 | € Xe Ye: ssyUNAL 





7 











eT TT 





oT 6 





8 9 
L S 
€ 
@ 








17 
4 
big | WeQqTy 
_ peqetdmog ste) 





SPSPSPSPESTES 


























(°3894 Jo 7] Wroy UyTM esn I0q) 


°ON UOTZBOTJTZUSpY 














WITTE le 


XN NO st 


S69~-aq 


e1009 MB 


t09TOA 


























17 
2% q 

olg | Weqry 
pezeTdwog sTTeD 





























(*9804 Jo ] Woy yzTm osn 4107) 











So 
2 
© 
a 
= 





Ayunod peze1 
auustnwy 


°ON UOTZBOTJ TQUepI 


tCehLa-g 








cr aque 











EDUCATIONAL AND PSYCHOLOGICAL MEASUREMENT 


Exhibit F 


PROCEDURE FOR SCORING TELEPHONE OPERATOR PERFORMANCE TEST 
Series 1000 
SCHEDULE OF CREDITS 
Forms D and M* 


CREDIT—1 POINT CREDIT—2 POINTS 
Proper phrase Ringing extension telephone 
“Yes, please?” 
“Public Assistance.” CREDIT—4 POINTS 
“Thank you.” Busy report 
Promptness (M) Central answers 


Appropriate phrase 
Ringing report 
Request to hold line 
CREDIT—6 POINTS 
Extension-to-extension connection maintained 
Note: If conneétion is completed but not maintained, 3 points 
(D) Ringing signal (after dialing) 
(D) Open line (call 11 only) 
CREDIT—8 POINTS 
Incoming call transferred (connection maintained ) 
Note: If transfer is made without maintaining connection, 4 
points 
(M) Outgoing call transferred (open line) 
CREDIT—10 POINTS 
(D) Outgoing call transferred (open line—calls 9 and 12 only) 


USE OF HIGHEST NUMBER TRUNK LINE: 
Once (D) 0 (M) 3 
Twice (Dp) 5 (M) 8 
Three times (D) 12 
VOICE: 
Satisfactory—2 (each observer) 
Very satisfactory—3 (each observer ) 


PASSING POINTS 
Class II 10 completed calls 
Class III 9 completed calls 
Class IV 8 completed calls 


*“D” in parenthesis indicates that the credit applies only to Form D; 
“M” in parenthesis indicates that the credit applies only to Form M. 


252 








Ue te 








ae 





SOME DATA ON THE KUDER PREFERENCE 
RECORD 


ARTHUR E. TRAXLER 


Educational Records Bureau 


AND 
WILLIAM C. MC CALL 


University of South Carolina 


WELL-ROUNDED guidance program calls for at 

least four types of objective measures: general intel- 
ligence, achievement in various fields of study, aptitudes 
of different types, and interests or motivation. Far more 
progress has been made in the first three of these areas 
than in the fourth. In recent years, however, there has 
been an especially large amount of experimentation in the 
last area and some promising measuring instruments are 
beginning to emerge. 

The majority of the noteworthy instruments for ap- 
praising interests have been concerned with occupational 
preferences. The most important work in this field has 
been done by Strong, who has constructed blanks and pre- 
pared scales for the measurement of the interests of men 
with respect to 34 occupations and the interests of women 
in connection with 18 occupations. Although the instru- 
ments developed for the measurement of interest in spe- 
cific vocations unquestionably have important guidance 
values, at least two considerations point to a trend away 
from the measurement of interests in occupations as such 
and toward the measurement of interests in broad fields. 

One consideration is based on observation and re- 
search. It has been known almost from the first attempts 
to measure vocational interests that interests in certain 
vocations are rather highly correlated. It has been appar- 


253 





pees wichita dais 
Pera mnle rae 

















EDUCATIONAL AND PSYCHOLOGICAL MEASUREMENT 


ent that there are clusters of occupations that have so 
many points of similarity that interest in one occupation 
is a strong indication of interest in several others. Factor- 
analysis studies have given emphasis to this point. For 
example, by means of a factorial analysis of the Strong 
Vocational Interest Blank, Thurstone’* found four interest 
groups. These groups were associated with science, lan- 
guage, people, and business. 

The second consideration grows out of the practical, 
everyday work of counselors and personnel officers. These 
workers have found that frequently when one is attempt- 
ing to guide the development of secondary-school pupils, 
or even of college freshmen, guidance with respect to 
specific occupations is not needed. In fact, guidance into 
specialization so early would in many cases be unwar- 
ranted. What is needed is a valid, reliable measure of 
interests in fairly broad fields so that the individual may 
be guided in the general direction of a group of related 
occupations, one of which will perhaps be chosen defi- 
nitely when the student has attained greater maturity. 

Strong, himself, has been one of the first to recognize 
the need for broader measurement of interests as well as 
measurement of interests related to specific vocations. In 
line with this viewpoint, he has recently published several 
group scales for the measurement of interests in broad 
areas. 

Certain other investigators have been working along 
somewhat similar lines. Probably the most promising 
new instrument in this general field is the Preference 
Record by G. F. Kuder.? 

Description of the Preference Record 

The Preference Record is designed for use in obtain- 
ing measures of motivation in the following seven fields: 
scientific, computational, musical, artistic, literary, social 





1L. L. Thurstone, “A Multiple Factor Study of Vocational Interests,” Per- 


sonnel Journal, X (1931), 198-205. 
2G. Frederic Kuder, Preference Record (Chicago: Science Research Asso- 


ciates, 1939). 


254 























DATA ON KUDER PREFERENCE RECORD 


service, and persuasive. It consists of 330 paired-compar- 
ison items of which the following are samples: 
A. (1) Draw graphs 
(2) Doclerical work 
B. (1) Bea lawyer 
(2) Bea landscape architect 
C. (1) Sell insurance 
(2) Do scientific research work. 


The subject indicates in each case which one of the pair 
of activities he prefers. 
The test is intended for use in high school and college. 
It is administered without time limit. The booklet is used 
with separate answer sheets, one for hand scoring, one for 
machine scoring, and one for self scoring. The raw scores 
of an individual student may be plotted on a percentile 
chart and thus a graphic indication of high points and 
low points with respect to the seven fields may be 
obtained. 
Nature and Purpose of the Study 
Kuder® has described the construction of the Prefer- 
ence Record in some detail and has reported a consider- 
able amount of statistical data for it. Helpful as these 
data are, they naturally do not cover all questions about 
the blank. Since no other studies of this new instrument 
were available, it seemed desirable to try to obtain 
answers to certain questions before arriving at decisions 
about the use of the blank in a regular testing program. 
The questions which this study attempts to answer are as 
follows: be 
1. What is the retest reliability of the scores on the 
Kuder Preference Record? 
2. Are the scores on the Preference Record relatively 
stable over a long period? 
5 What differences are there between the mean scores 
for boys and for girls on the Preference Record? 





3G. F. Kuder, “The Stability of Preference Items,’ The Journal of Social 
Psychology, X (1939), 41-50. 


255 











EDUCATIONAL AND PSYCHOLOGICAL MEASUREMENT 


4. Do the mean scores for different secondary-school 
groups change appreciably with change in grade 
level? 

5. What is the shape of the mean profiles of univer- 
sity freshmen in different fields of study? 


The data were obtained by administering the Prefer- 
ence Record to freshmen in the University of South Caro- 
lina, pupils in Grades 10, 11, and 12 of a high school in 
South Carolina, and a number of adults who were on the 
staff of an educational organization in New York City. 

Reliability 

In the manual for the Preference Record, Kuder 
gives the following reliabilities for the different scales: 
scientific, .87; computational, .85; musical, .88; artistic, 
.90; literary, .90; social service, .84; persuasive, .90. These 
reliabilities were estimated from one administration of 
the test to a group of 84 college students through the 
application of the Kuder-Richardson method of estimat- 
ing reliability coefficients.* Since the procedure employed 
is still somewhat experimental and is not as yet generally 
used, it seemed advisable to check the reliability of the 
various scales by the more familiar test-retest procedure. 
Accordingly, 52 college freshmen and 90 high-school 
pupils who had filled out the Preference Record near the 
beginning of the term were retested after an interval of a 
few weeks. The elapsed time was approximately one 
month for the high-school pupils and two months for the 
college students. The correlations between the scores re- 
sulting from the two administrations are shown in Table 
1. Means and standard deviations of the distributions are 
also given. 

For all scales, the correlations between the two admin- 
istrations of the Preference Record to the secondary- 
school group are above .8. They vary from approximately 
.81 to about .91. With the exception of the correlation 





4G. F. Kuder and M. W. Richardson, “The Theory of the Estimation of 
Test Reliability,’ Psychometrika, Il (1937), 151-60. 


256 








a ae 











Av 





DATA ON KUDER PREFERENCE RECORD 


TABLE 1 


RETEST RELIABILITY OF THE KUDER PREFERENCE RECORD BASED 
ON THE SCORES OF SECONDARY SCHOOL PUPILS AND OF COLLEGE 
FRESHMEN IN SOUTH CAROLINA 


Secondary School Pupils College Freshmen 














ae Rh RE in BE igh ae 





N tr PE Mx SDx My SDy |N r PE Mx SDx My SDy 

— 90 .907+.013 41.80 9.33 41.83 9.25 | 52 .782+.036 42.08 9.68 43.65 9.19 
‘omputa- 

tional 90 .814+.024 19.82 7.24 19.38 6.75 |52 .748+.041 18.15 7.88 18.73 6.70 

Musical 90 .876+.017 16.69 7.39 16.24 7.18 | 52 .871+.023 18.92 7.15 17.54 6.49 





Artistic 90 .857+.019 30.87 8.97 31.10 8.04] 52 .820+.031 30.92 8.86 29.85 8.36 
seeeaty 90 .863+.018 32.30 9.71 32.80 10.44 | 52 .789+.035 31.56 10.10 34.27 10.67 
Socia 

Service 90 .838+.021 39.97 9.66 41.20 10.15 | 52 .588+.061 41.37 9.42 43.54 8.02 
Persuasive 90 .838+.021 45.27 9.25 46.27 8.95 | 52 .795+.034 45.62 9.30 46.65 9.16 








for the social service scale, the correlations between the 
two administrations of the test to the college group are 
above .74. They range upward to approximately .87. In 
general, these coefficients are rather high for correlations 
based on retesting after an interval of several weeks. In 
fact, most of the correlations seem exceptionally satisfac- 
tory for a measuring device that can be administered and 
scored so quickly and that yields as many as seven scores. 

The correlations based on the secondary-school group 
are high enough to warrant considerable use of the Pref- 
erence Record in individual prediction and guidance. 
Reliability coefficients above .90 are theoretically desir- 
able for a test that is to be used in this way, but experience 
indicates that they are very seldom attained in a test that 
yields several different scores. 

The correlations for the college freshmen tend to be 
lower than those for the secondary-school pupils, 
although there is no significant difference in the case of 
the musical scale. The correlation for the social service 
scale, .588, is much the lowest in the group. The second- 
ary-school data indicate, however, that this scale is not 
less reliable than some of the others.” 

The reason for the somewhat higher correlations at 
the secondary-school level than at the college-freshman 
level is not entirely clear. It was thought at first that pos- 





5Since this correlation was out of line with the others, it was rechecked with 
the greatest of care. Every paper was rescored, the data were redistributed, 
and the entire calculation was carried through a second time. It seems certain, 
therefore, that no error of a clerical nature is involved. 


257 


I wg 7a 


Rises i. 


si waa 











EDUCATIONAL AND PSYCHOLOGICAL MEASUREMENT 


sibly the combining of the three high-school grades into 
one group had increased the variability and thus raised 
the correlations. However, a comparison of the standard 
deviations shows that in general the variability is not 
greater for the high-school group. 

The difference in magnitude of the correlations may 
be due simply to the longer time interval between admin- 
istrations of the Preference Record to the college group. 
By the same reasoning, the correlations for the high- 
school pupils may be a little lower than they would have 
been if only a few days had elapsed before the test was 
repeated. It is improbable, however, that the basic inter- 
ests and motives of either high-school or college students 
change significantly during a period of a few weeks. 
Moreover, the repetition of the Preference Record after 
a very brief period would have been subject to the limita- 
tion that amemory factor might have produced spuriously 
whigh correlations. 

The retest correlations based on the secondary-school 
group correspond rather well with the estimated reliabili- 
ties reported by Kuder. The correlation obtained in this 
study for the scientific scale is a little higher than Kuder’s 
figure. The two sets of reliabilities for the musical scale 
and the social service scale would agree exactly if those 
reported here were rounded to two decimal places. In 
the case of the other four scales, the retest correlations for 
the secondary-school pupils are lower than Kuder’s reli- 
abilities, but the differences are not marked. 

yw The college-freshman retest correlation for the musi- 
cal scale is in very close agreement with Kuder’s reliabil- 
ity coefficient. The college-freshman correlations for the 
other scales are significantly lower than those found by 
Kuder, but the only striking difference is between the 
correlations for the social service scale. 


Means and Standard Deviations 
Although this is not concerned with one of the main 
questions raised in this study, it may be noted in passing 


258 




















SS ae 


As 


Vv 














DATA ON KUDER PREFERENCE RECORD 


that the means and the variabilities of the distributions 
resulting from the two administrations of the Preference 
Record tend to be closely similar in both groups. Appar- 
ently the practice effect was negligible; that is, the scores 
did not tend to be higher on the second administration as 
a result of the subjects’ having taken the test previously. 
The absence of evidence of practice effect is a further 
point in favor of the Preference Record. 

Another interesting observation based on the means 
and standard deviations shown in Table | is that, on the 
whole, the difference between the two groups in central 
tendency and variability are slight. This observation sug- 
gests that interests in the seven areas involved are rela- 
tively mature by the time pupils enter the secondary 
school. The largest difference in favor of the college- 
freshman group is found in the social service scale, a 
result which familiarity with scores of high-school and 
college students on the Strong Vocational Interest Blank 
would lead one to expect. 


Stability of the Scores 


We have just noted that the retest correlations for the ‘ 


scales of the Kuder Preference Record tend to be rather 
high for an interval of a few weeks. But how high would 
they be for a rather long period—let us say, a year or 
more? In other words, what is the stability of the scores 
and what is their value for long-time predictions? Some 
information relative to these questions is provided by the 
correlations in Table 2, which are based on the retesting 


TABLE 2 
CORRELATIONS BETWEEN SCORES MADE BY SIXTEEN ADULTS ON 


TWO ADMINISTRATIONS OF THE KUDER PREFERENCE RECORD 
AFTER AN INTERVAL OF APPROXIMATELY FIFTEEN MONTHS 











Scale N r PE Mx SDx My SDy 
ae ae 16 828.053 47.25 12.14 47.50 10.94 
Computational ......... 16 -864-.043 24.56 8.27 23.81 8.68 
EIR 8 fs cen' 5 yan As 16 -933=.022 22.00 8.46 22.63 8.31 
Ee ey 16 -698=.086 31.38 6.90 31.25 8.35 
NI 0/45 5 2) sin aioe esa 16 810.058 44.25 8.48 44.25 8.18 
Social Service .......... 16 -611.106 42.38 7.17 41.50 8.29 
Persuasive ...........:. 16 -883=.037 31.38 11.70 32.88 12.42 





259 





al 





EDUCATIONAL AND PSYCHOLOGICAL MEASUREMENT 


of 16 adults with the Preference Record after an interval 
of approximately 15 months. 

The correlations in Table 2 range from above .6 for 
the social service scale to above .9 for the musical scale. 
The correlations for all the scales except the artistic and 
social service ones are above .8. The reliability of these 
correlations is of course limited by the small number of 
cases. For example, the correlation coefficient for the 
artistic scale was lowered considerably by a marked 
change in the score of one person. 

Nevertheless, the correlations in Table 2 suggest that 
in general the scores on the Preference Record are rather 
stable for a period as long as 15 months and that they 
provide a fairly satisfactory basis for long-time predic- 
tions. Emphasis is given to this point when one examines 
the preference profiles of the different individuals. In 
nearly all cases, the high and the low points resulting 
from the first administration of the test were closely simi- 
lar to those based on the second administration. The pro- 
files for two individuals, in terms of percentiles, are 
shown in Figures | and 2. 


Sex Differences 


When one is interpreting the profile of an individual 
on the Preference Record, it is of some interest to know 
whether there are characteristic differences between the 
average scores of boys and girls. The mean scores made 
on the various scales of the Preference Record by groups 


TABLE 3 
MEAN SCORES OF GROUPS OF BOYS AND GIRLS IN HIGH SCHOOL 
AND IN COLLEGE ON THE KUDER PREFERENCE RECORD 


Freshman Freshman’ Freshmen 














High High Boys Girls ina 
School School ina State in a State Girls’ 
Scale Boys Girls University University College 

Number of Cases... 152 135 303 173 584 
Scientific .......... 48.1 37.9 47.9 41.4 41.7 
Computational ..... 21.6 16.9 21.8 47.7 17.8 
Musical ........... 13.2 18.2 15.8 19.7 20.6 
ME oS woe asin > = 27.4 29.9 26.9 32.1 29.7 
RN, eas wie ing 30.4 33.4 32.8 36.7 35.8 
Social Service ..... 37.8 44.8 40.7 45.4 46.7 
Persuasive ........ 47.9 44.9 49.4 42.8 44.0 





260 











~ 








DATA ON KUDER PREFERENCE RECORD 


1. Scientific 5. Literary 
2. Computational 6. Social Service 
3. Musical 7. Persuasive 


4. Artistic ' 
1 2 3 4 5 6 7 





—$ 








4 





1 
3 
Percentiles 


a 25 





—— First testing 
RS fe Second testing 




















0 


Figure 1. Profile of a Girl Secretary with a 
Long-Standing Interest in Music. 


261 








1. Scientific 5. Literary 
2. Computational 6. Social Service 
3. Musical 7. Persuasive 


4 4. Artistic 





— 75 


1 
3 
Percentiles 


Ni 
Mw 


10 





—First testing 
~+--Second testing 











Figure 2. Profile of a Machine Scoring Supervisor, an Occupation 
for which Interest and Ability in Computation Are Very Important. 


262 


EDUCATIONAL AND PSYCHOLOGICAL MEASUREMENT 


















BI PST RRS ER 





es 


v 


zs 











DATA ON KUDER PREFERENCE RECORD 


of high-school boys and girls and college-freshman boys 
and girls in South Carolina are presented in Table 3. 

As one might expect, in both groups the boys are on 
the average higher than the girls in scientific, computa- 
tional, and persuasive preferences, while the girls surpass 
the boys in musical, artistic, literary, and social service 
preferences. The largest differences are for the scientific 
scale. However, even the smallest differences in medians 
amount to nearly 10 percentile points. It appears, there- 
fore, that sex differences in preferences should be taken 
into consideration when the scores on this test are 
interpreted. 


Grade-Level Differences 


When the results of achievement tests are being stud- 
ied, the usual procedure is to interpret the scores in terms 
of the norms. for the grade the pupils are in. Is it neces- 
sary to follow this procedure with the Preference Record 
or are the results in different grades so similar that one is 
justified in disregarding grade level, at least as far as the 
secondary school is concerned? Some information on this 
question is given in Table 4, which shows mean scores of 
boys and of girls in Grades IX, X, and XI of a South 
Carolina High School. 


TABLE 4 
MEAN SCORES MADE ON THE PREFERENCE RECORD BY GROUPS OF 
BOYS AND GIRLS IN GRADES IX, X, AND XI OF A SOUTH CAROLINA 
HIGH SCHOOL 

















Boys Girls 
Grade Grade Grade Grade Grade Grade 
Scale Ix x XI Ix x XI 

Number of Cases... 48 77 27 42 72 21 

IE xen ss sens 48.8 48.4 45.8 36.1 39.0 38.0 
Computational ..... 20.5 22.0 22.2 16.9 16.9 16.7 
OS aera 11.3 13.6 15.2 18.9 17.4 19.8 
PEPRIO: os s-sin os aes. 27.1 28.1 25.9 31.3 29.4 28.8 
OR oo as at vin. 4'2-< 30.3 30.5 30.1 32.9 33.9 32.9 
Social Service....... 40.0 37.0 36.3 44.0 44.6 47.1 
Persuasive ......... 47.8 47.5 49.2 42.3 46.0 46.4 





Because of the rather small number of cases, the means 
are not highly reliable indicators of the preferences at the 
different grade levels. Nevertheless, the fluctuations in 


263 

















EDUCATIONAL AND PSYCHOLOGICAL MEASUREMENT 


mean scores made by the pupils in the three grades are not 
great. Moreover, there is no consistent trend toward the 
obtaining of higher or lower ratings with advancement 
in grade level. The data in Table 4 are by no means con- 
clusive, but, as far as they may be interpreted, they suggest 
that different norms for these three grades are not needed. 


Mean Profiles for Different Fields of Study 


The means and standard deviations of scores of Uni- 
versity of South Carolina freshmen classified according 
to field of specialization are shown in Table 5. The per- 
centile ratings of the mean scores are indicated graph- 
ically in Figure 3. 

TABLE 5 


MEANS AND STANDARD DEVIATIONS OF SCORES MADE ON KUDER 
PREFERENCE RECORD BY VARIOUS GROUPS OF FRESHMEN IN THE 
UNIVERSITY OF SOUTH CAROLINA, INCLUDING BOTH MEN AND 






































WOMEN 

Engineering(B.S.) Journalism Art Education(A.B.) 
__ Scale N Mean S.D. N Mean S.D. N Mean S.D. N Mean _S.D. 
Scientific. eee 79 53.30 6.20 26 39.23 6.40 15 41.93 7.19 36 39.50 10.08 
Computational Se aed om 79 24.29 5.32 26 14.92 5.48 15 17.67 10.52 36 16.83 6.66 
| ee 79 14.44 6.45 26 18.77 6.11 15 18.07 5.31 36 18.72 7.85 
ee 79 29.18 7.17 26 25.23 7.22 15 45.67 7.33 36 25.50 7.98 
Oe , Te 79 29.38 7.44 26 54.46. 6.55 15 31.13 5.54 36 38.28 8.69 
Social Service ....... 79 40.14 7.21 26 41.00 9.38 15 44.87 5.08 36 44.78 11.46 
PEOMMOIIUE 600000505 79 46.90 8.06 26 48.15 9.33 15 42.73 8.79 36 44.28 9.48 

Commerce(B.S.) Secretarial Science Pre-Medicine Pharmacy 
Scale N Mean S.D. N Mean S.D. N Mean S.D. N Mean S.D. 
| {aesaaeeress 82 40.85 8.50 83 38.59 8 53 56.24 8.66 13 54.69 7.80 
Computational ....... 82 24.71 6.47 83 20.86 7.13 53 17.04 5.31 13 16.54 5.21 
IR a5 a8 orev pwisils 82 15.49 6.47 83 19.12 5.89 53 16.81 7.22 13 18.54 8.16 
a iin a iotgin iat 82 24.98 6.16 83 30.18 8.11 53 26.77 7.02 13 28.08 6.82 
| eae 82 31.12 8.53 83 33.84 8.55 53 34.28 7.95 13 30.08 7.13 
Social Service ....... 82 40.85 7.79 83 43.00 8.99 53 48.59 7.25 13 42.85 7.08 
Or 82 53.83 8.66 83 45.17 9.16 53 44.62 10.55 13 43.62 8.39 

Arts and Arts and 
Pre-Law Sciences(A.B.) Sciences(B.S.) 
Scale N Mean S.D. N Mean’ S.D. N Mean _ S.D. 
ERIE EE ee en Se POE 32 40.00 3.50 169 41.43 8.60 49 51.45 9.41 
EEE TT 32 20.69 5.85 169 17.64 6.68 49 19.57 6.98 
SEER UR Rig 9 ae ie Oar ee 32 16.69 5.22 169 19.13 7.32 49 16.18 7.17 
Ei Sed Siete ans ec sas Res Oba 32 24.13 7.38 169 30.66 9.30 49 27.61 8.32 
REE A ren Ser 32 38.00 8.65 170 35.71 10.21 49 34.43 8.59 
ES EWIIE so6ass id's kasha ee ane 32 38.13 6.34 169 44.33 10.38 49 43.12 10.07 
Serer ree 32 57.94 10.99 170 44.29 10.42 49 44.10 10.65 





The profiles of no two groups are alike. The greatest 
similarity probably is in the profiles for the pre-medical 
and pharmacy groups, but even here the correspondence 
is not especially close when all the scores are considered. 
Most of the high points and low points in the profiles 


264 








i 


















DATA ON KUDER PREFERENCE RECORD 














































































; Percentiles Percentiles 
t 20 30 40 $0 60 70 80 9 20 30 40 $0 60 70 - $0 9 
Engineering (8.S.) Pre-tedical ! 
2 } 
3 Gad 3 : 
z 4 | 4 
Pa ‘ 
A 7 Lietnennaee 7 
4 Journalism 
2 : 2 
ed 3 3 
i 4 : 
¢ 5 Neem 
t 6 6 
i 7 Ci 7 
Art : Pre-Law 
1 : i 
2 2 
3 : 3 
4 : 4 
5 : 5 
6 ; 6 
7 | i 7 } 
Arts and Science (A.C.) 
! t : 
2 2 
3 3 
4 4 
5 5 
€ 6 
7 | 7 
a Commerce (B.S.) 
; 
; \ 
2 2 
3 3 
4 4 
5 5 
6 6 
7 7 
Secretarial Science 
, ; 
2 
3 
4 
5 
6 
7 








20-30 40 3060 70 *80 20 30 40 (50 % nm 0 © 


Figure 3. Mean Profiles of Groups of Freshmen Who Have Indicated 
Educational or Occupational Choices in the Fields Named. 


265 











EDUCATIONAL AND PSYCHOLOGICAL MEASUREMENT 


occur according to one’s expectation. The art group, for 
example, is rather low in the scientific scale but very high 
in the artistic scale. The journalism group is close to the 
thirtieth percentile on the scientific and computational 
scales, but almost up to the ninetieth percentile on the 
literary scale. The commerce group is approximately at 
the eightieth percentile in computational and persuasive 
interests, but close to the thirty-fifth percentile in scien- 
tific and literary interests. Both the pre-medical and the 
pharmacy students are high in scientific interests. The 
pre-medical students are also fairly high in social service 
interests. The pre-law students are close to the thirtieth 
percentile on the scientific and social service scales, but 
near the eighty-fifth percentile on the persuasive scale 
and not far below the seventieth percentile on the compu- 
tational scale. The engineering group is above the sev- 
entieth percentile in computational and scientific inter- 
ests, but below the thirtieth percentile in literary interests. 
In the manual for the Preference Record, Kuder has 
given median profiles for groups of students who have 
chosen occupations in the fields of writing, social service, 
physical sciences, political science, business and account- 
ing, veterinary medicine, medicine, and law. A compari- 
son of the profiles in Figure 3 with Kuder’s profiles re- 
veals noteworthy similarities between those for (1) jour- 
nalism and writing, (2) commerce and business and 
accounting, (3) pre-medical course and medicine, (4) 
pre-law and law, and (5) arts and sciences (B.S.) and 
physical sciences. The fact that the profiles derived from 
two independent sources for groups in the same general 
areas are similar and are on the whole in agreement with 
what one would reasonably expect is favorable to the 
reliability and validity of the Preference Record. 


Conclusions 


1. The retest reliability of the scales of the Kuder 
Preference Record is rather high. The correlations be- 
tween the scores resulting from two administrations of the 


266 














eS ee 








DATA ON KUDER PREFERENCE RECORD 


Preference Record to a group of high-school pupils with 
a time interval of about one month were above .8 for all 
seven scales. The correlations between the scores based on 
two administrations of the Record to a group of college 
freshmen with a time interval of two months were above 
.7 for six of the seven scales. 

2. The scores on the Preference Record do not seem 
to be influenced by practice in taking the Record when 
there is an interval of several weeks between administra- 
tions of the record. The mean scores resulting from the 
second administration of the Record were not appreciably 
or consistently higher than the scores obtained the first 
time the Record was taken. 

3. The scores on the Preference Record appear to 
have considerable value for relatively long-time predic- 
tions as far as adults are concerned. The correlations 
between the scores of 16 adults after an interval of 15 
months were fairly high, varying from about .6 to slightly 
above .9. 

There are noteworthy sex differences between the 
méan scores of high-school and college boys and girls. 
On the average, the boys exceed the girls in scientific, 
computational, and persuasive preferences; the girls are 
higher than the boys in musical, artistic, literary, and 
social service preferences. 

$- It appears that interests and motivation in the 
seven areas involved are relatively mature by the time 
pupils reach the secondary school. The differences be- 
tween the mean scores of pupils in Grades IX, X, and XI 
were found to be slight, and there was no consistent trend 
toward higher scores with increase in grade level. Simi- 
larly, the differences between the mean scores of the sec- 
ondary-school and college freshman groups were small. 

Mean profiles were found for 11 groups of univer- 
sity freshmen classified according to field of study or 
occupational choice. The profiles tended to have high 
points and low points at the places where one would 


267 











EDUCATIONAL AND PSYCHOLOGICAL MEASUREMENT 


expect them to be. A comparison with profiles found by 
Kuder showed that those for groups in the same general 
fields were similar in shape. 

In general, the data in this article are favorable to 
the Kuder Preference Record. By far the most important 
question that remains to be answered has to do with the 
validity of the Record. Do the scales really measure what 
they purport to measure? While certain aspects of the 
data reported in Kuder’s manual and in this article imply 
considerable validity, there is at present little direct evi- 
dence concerning the validity of the Preference Record. 
One of the writers reported a small amount of data on the 
validity of the Record in Buros’ 1940 Mental Measure- 
ments Yearbook, but there was nothing conclusive in the 
findings. Further study of the Preference Record could 
well be directed toward this question. 

















THE RELIABILITY OF RATIO SCORES 


LEE J. CRONBACH 
State College of Washington 


DUCATIONAL measurements often give rise to 

quotients or ratios obtained when one score is divided 
by another. The intelligence quotient, achievement quo- 
tient, and per cent accuracy scores are examples. For 
the effective interpretation of such a measure, it is im- 
portant that an appropriate estimate of its reliability be 
obtained. While a formula for the reliability of ratios 
has been presented by Holzinger, this, like other ap- 
proaches, has limitations which apparently have not pre- 
viously been discussed. The present article is intended 
to summarize the procedures which may be applied to 
ratio scores and to indicate the conditions under which 
each is appropriate. 


Ratio scores appear to be particularly important in 
dealing with certain new-type tests such as those now 
being published by the Progressive Education Associa- 
tion. In Test 1.41, Social Problems,’ for example, the 
student is presented with a description of a social prob- 
lem, asked to state which of several-solutions he favors, 
and then to check, from a long list, which reasons he 
would advance to support his decision. Since the student 
may check as many reasons as he wishes, there is in prac- 
tice a wide range of “Total Reasons” among students. 
One important datum is the extent to which the student 
checks reasons which are inconsistent with his conclusion 





1T est 1.41, Social Problems. Chicago: Progressive Education Association. 


269 








EDUCATIONAL AND PSYCHOLOGICAL MEASUREMENT 


and really support one of the other solutions. This is 
expressed in the score “Number Inconsistent.” In order 
to compare one student with another, it is convenient to 
eliminate the comprehensiveness factor, expressing his 
performance in a “Per Cent Inconsistent” score by divid- 
ing Number Inconsistent by Total Reasons. 

Retest method. One of the most satisfactory estimates 
of the reliability of such a score is to be obtained by the 
retest method. This method has generally been used to 
determine the reliability of the I. Q. It is not easy to 
rule out possible practice effect, even when parallel forms 
are used. For some tests it is difficult to prepare a parallel 
form; even where two forms are available, it is often 
desired to estimate the reliability without the trouble a 
second testing requires. Once data from two forms are 
available, it is a simple matter to compute the two ratios 
for each student and correlate them. It will be shown 
below, however, that a coefficient so obtained may not be 
equally appropriate for all scores in a given population. 

Kuder-Richardson method. Where retesting is im- 
practicable, it is customary with ordinary tests to use the 
Spearman-Brown split-half procedure or the recently de- 
veloped Kuder-Richardson method. The Kuder-Rich- 
ardson method is based upon a summation of the variance 
of the items composing the total score;” since a ratio score 
cannot be conceived as composed of a sum of items, the 
method is not applicable (except in that special case 
where the denominator of the ratio is a constant for all 
students). Whether a modification of this method can be 
developed which is appropriate for ratios is not known. 

Split-half method. The Spearman-Brown formula 
is based on the assumption that the two halves into which 
the test has been split may be added to form the whole. 
In the case of ratio scores, this assumption does not hold. 
Since the denominator of the ratio is, in general, different 





2G. F. Kuder and M. W. Richardson, “The Theory of the Estimation of 
Test Reliability,” Psychometrika, II (1937), 151-60. 


270 

































RELIABILITY OF RATIO SCORES 


in each half, 412 + 227 is not equal to”, or 1/2 %. It 
bij. 2/2 b b 


would be possible to correlate - with“; this would, 





by the Spearman - Brown formula, yield ‘-. Since 


this disregards the possibility of error in measuring the 
denominator, it does not estimate the reliability of the 
ratio in the usual sense of ra... It is obvious that these 
bib2 

objections do not apply to the special case where the de- 
nominator is the same for every student, as in the per cent 
accuracy score on a test where every student attempts 
every item. Here, the reliability of the ratio is the same 
as the reliability of the numerator. 


Computation by formula. Statistical formulas for 
obtaining the mean, standard deviation, and correlations 
involving ratios by indirect methods were developed by 
early workers. These formulas, obtained by assuming 
that the variation of the denominator is small compared 
to its mean, are as follows: 


Ifi—-2j= . “= — and so on, then 








b ‘ 
_M ie. 5 
M, =—*( 1 — ravav, + v,7) ; (1) 
M, 
| M 3 2 9 3 
6; M2 (Ve" — 2 FavVaVn + Vo") 5 (2) 
ne ep, a Late — FedPeVa — Tye Ve¥e + PraVoVa 
a ated 
5a V Ug + Uy) — 2rarVeV, V Ono + Ve? — 2K ceaVoVa 
(3)* 


If the reliability of a score is conceived as its correlation 
with itself, one may substitute a for c and 6 for d in 





3G. U. Yule and M. G. Kendall, Introduction to the Theory of Statistics 
(12th ed., revised; Philadelphia: Lippincott, 1937), pp. 299-300. 

4Karl Pearson, “On a Form of Spurious Correlations Which May Arise 
When Indices Are Used in the Measure of Organs,” Proceedings of the Royal 
Society, LX (1897), pp. 489 ff. 


271 














EDUCATIONAL AND PSYCHOLOGICAL MEASUREMENT 


formula (3), obtaining this formula for the reliability: 


2 5 
r = ToaVar aaa 2ravVaVs + Ton Up - (4) 





s|8 


Ve — 2 FanVagVp + Vp" 

This formula may be employed wherever the necessary 
data can be computed. Since r,,, r;,, and the other vari- 
ables can be obtained without a retest, data from a single 
testing are sufficient. Either the split-half or Kuder- 
Richardson method may be used to estimate these relia- 
bilities. It must be emphasized, however, that the 
formula is applicable only when the basic assumption is 
valid, namely, when the spread of the denominator vari- 
able is small compared to its mean. 


In actual school testing of a single grade, the variation 
in mental age or chronological age is normally quite a 


small fraction of the mean M.A. or C.A. The ratio a 


for M.A. and C.A. is likely to be sufficiently small that 
higher powers can be neglected; as a result, the formula 
can be applied to either the A.Q. or the I.Q. under this 
condition. An empirical test of the formula by Morley 
showed close correspondence between formula results and 
results from a retest of 381 pupils, all in Grade VIA, on 
several achievement quotients.® It does not follow that 
the formula can be applied to the achievement quotient 
if several grades are included in one population. 


When the Per Cent Inconsistent score is studied, one 
finds that the assumption does not necessarily hold. The 
average student checks less than 50 reasons, but the range 
in Total Reasons often is from 10 reasons to 70 reasons. 
The coefficient of variation of the denominator in such a 
case is so high that error may follow when the formula is 
applied. A further confusion lies in the fact that all 





5This formula was first developed by Holzinger. See K. J. Holzinger, 
“Formulas for the Correlation between Ratios,’ Journal of Educational 
Psychology, XIV (1923), 344-47. It may also be derived directly by approxima- 
tion from expansions of infinite series. 

6C. A. Morley, “The Reliability of the Achievement Quotient,” Journal of 
Educational Psychology, XXI (1930), 355-56. 


272 















a OL an  S.l la SE ee 








RELIABILITY OF RATIO SCORES 


scores in a givén population are not equally reliable. A 
digression to demonstrate this point is necessary before 
methods of attacking this problem can be presented. 


It is well known that the reliability of a test is a func- 
tion of the length of the test. The student who marks 
three inconsistent reasons out of 10 reasons used, and the 
student who marks 30 inconsistent reasons out of the 100 
he uses, both receive Per Cent Inconsistent scores of 30. 
The score of the latter student is an estimate of his in- 
consistency based on 100 responses; the estimate of the 
former student is based on only 10. If the former student 
were to mark one additional reason, his Per Cent Incon- 
sistency score would increase to 36 per cent or decrease 
to 27 per cent; if the second student were to mark one 
more reason, his score would shift upward only to 30.7 
per cent or downward to 29.7 per cent, depending, of 
course, on whether the additional reason were consistent 
or not. Similarly, a change of only one point in the 
numerator produces a much greater change in the ratio 
for the student whose denominator score is low. From a 
logical point of view, then, we would expect the standard 
error of measurement of a ratio score to increase as the 
denominator decreases. The standard error of measure- 
ment and the reliability coefficient are inversely related; 
therefore, the reliability of a score increases as the size 
of the denominator increases. 


Possibly reference to the per cent accuracy concept 
will further clarify this point. If we were to ask a stu- 
dent a single question, he could answer it correctly or 
incorrectly or could omit it. If we desired to know the 
percentage of his attempts that were successful, we could 
compute a per cent accuracy score, which, based on one 
question, could only be 100, 0, or indeterminate. Certainly 
no measure of this sort, based on one item, would be con- 
sidered significant. If two questions were asked, he could 
have both right, both wrong, one right and one wrong, or 
could omit either, or both. His per cent accuracy score, 


273 





EDUCATIONAL AND PSYCHOLOGICAL MEASUREMENT 


under each of these conditions in order, would be 100, 0, 
50, 100 or 0, or indeterminate. Obviously, when a score 
of 50 per cent is possible, discrimination is finer than 
when only scores of 100 and 0 are possible. Similarly, as 
the number of items attempted increases, discrimination 
becomes increasingly fine, which means that accuracy, 
hence reliability, of measurement increases. No matter 
how many items are added to the test, if a student omits 
all the items no meaningful per cent accuracy score can 
be obtained, and, in general, the accuracy with which his 
performance is measured depends upon the number of 
items he attempts. Trimble has suggested’ that this 
applies not only to the ratio scores, but to scores on any 
test where the student is instructed to respond to as many 
items as he wishes; this pattern is found in several tests 
of the Progressive Education Association series. 


Since it has therefore been demonstrated that the re- 
liability of any score - is a function of the size of 5, as 
well as of the test used and the group measured, one may 
raise the question: how can‘a formula for reliability of a 
ratio, giving a single answer, be meaningful? The answer 
may be obtained by recalling the basic assumption under 
which the formula was obtained, to wit, that it holds only 
for those cases where o, is small compared to b. This is of 
course most likely where either (a) there is little variation 
in 6 scores within the group or (b) values of b are high. In 
the former case, all scores will have about the same re- 
liability, which can be estimated by formula (4). In the 
latter case, the value obtained by the formula is a limiting 
value. It was pointed out that the reliability of a ratio 
increases as the denominator increases, other things being 
equal. Since, as the denominator increases, the case is 
approached where powers of o, are completely negligible, 

b 
it follows that the value given by formula (4) is valid 





TIn correspondence with the writer. 


274 








ee 

















RELIABILITY OF RATIO SCORES 


only where the assumption is met, and that for lower 
values of 6 one may expect a lower reliability. 


Another approach to the same type of statistic gives 
a formula for the standard error of measurement. If a 
large number of measures of a, say dy, ds, d3,..., an, and 
corresponding measures b,, b., bs,..., b, are obtained 
for the same person by a series of m measurements, a set of 
values i; will also be obtained. The standard deviation of 
this set is by definition the standard error of measurement 
of the ratio. From (2), 


2 

a 
‘ty 

M ;; 


If one assumes that errors in a are independent of errors 
in b, when the same person is tested repeatedly, 


ae _M 2 aCe an ma) ") 
2 = 2 mae 5] ae 
0; O meas: M2 + (6) 


or, if s is used as a sida in the standard error of meas- 


urement, 
ie) Gai sf) a 


where s, - s, can be pae by the usual methods. This 
reduces, using the identity s = o V 1 — ry, to (4). It 
must again be stressed that this formula is valid only 
where 6, is small enough that powers may be disregarded. 


—_—_— 


b 

Since the formulas (4) and (7) are valid for some 
sets of scores and invalid for others, some procedure must 
be developed to determine where the formula is appli- 
cable. A useful test to determine whether the formula 
applies to any set of scores is to obtain values for M, and o, 
empirically. If the values are close to those found by 
formulas (1) and (2), the assumption may be considered 
reasonable in this case; if discrepancy appears, the 
formula should not be used. 


275 


0,; 


(Va; agit 27 asbj Vas Vb, v Vey) (5) 











si 





l. 
2. 


EDUCATIONAL AND PSYCHOLOGICAL MEASUREMENT 


With such a score as Per Cent Inconsistent on 1.41, the 


assumption may hold for high values of 5, but not for 
scores based on small values of 6 (Total Reasons). In 
this case it is possible to compute by formula the relia- 
bility of those scores where the assumption applies, but 
not for cases where the denominator is small. To de- 
termine the range where the formula can be used, the 
following procedure has been found efficient. 


A scatter diagram of a against } is made for the sample. 


An estimate is made that the assumption will hold for values of 4 
greater than a certain value, say 5’. For all cases where d is equal 
to or greater than b’, the standard deviation and mean of 4, and rp, 
are computed from the appropriate rows in the scatter diagram. 


In a separate scatter diagram, i is plotted against 5, using the same 
class intervals for 6 as before. For all of the cases which fall in rows 
so that b = b’, o; and M; are computed by the usual method. 


Using formulas (1) and (2), o; and M; are computed from the data 
obtained in step (2). If these values are equal, or virtually so, to 
those obtained empirically for the same population in step (3), the 
assumption that the variation of b is small compared to 3b itself is 
probably justified for b = b’. 


If the values from steps (3) and (4) are equal, it is possible that the 
assumption holds for a value b” < b’. If the values from steps (3) 
and (4) are not equal, it is necessary to test a value b” > b’. In 
either case, a new hypothesis is made, that the assumption holds if 
b = b”. Using the same scatter diagrams, values are calculated for 
the means, standard deviations. and rap for cases at or above b”. This 
is a comparatively simple step, as most of the previous computation can 
be used again. Again, the values of M; and o; obtained by formula 
are checked against those obtained empirically. By a repetition of 
this process, it is possible to determine the smallest value b™ of b for 
which the statistics derived empirically and by the formula are equal 
within prescribed limits of accuracy. It is probably unnecessary to 
compare the means, as a check between the estimated and empirical 
standard deviations should be an adequate test; since it requires little 
additional work to check means also, it is probably wise to do so. 


276 





































Creep cen 


~~ 





ae. 


ee 








—- “ CD 


f 
r 


1 


1 








elicit geist otis sierra asc sah 





, 


——— 





RELIABILITY OF RATIO SCORES 


Having identified the range of b for which the as- 
sumption holds, one may compute the reliability or 
standard error of measurement by formula (4) or (7). 
Except for r,, and r;,, the statistics which enter this 
equation have already been computed in the steps above. 
In many cases it is most simple to compute r,, and r,, by 
the split-half method. If the Kuder-Richardson method 
is used, it is necessary to make an item analysis of those 
papers whose b-values are sufficiently high, separate from 
such an item analysis of all papers as would ordinarily be 
made for other purposes. It is possible to plan the item 
analysis in advance, ranking papers in the order of their 
b-scores, so that the Kuder-Richardson method may be 
applied to a portion of the population economically. 


Summary 


Methods appropriate for computing the reliability of 
ratio scores have been discussed. They are: 


(1) The retest method, which requires construction 
of a parallel form for greatest meaning. The necessity of 
a second testing makes this inapplicable in many situa- 
tions. A coefficient so obtained assumes that the relia- 
bility of all scores in a group is the same. 


(2) The Spearman-Brown formula, applied to the 
correlation between scores based on a splitting of the test - 
into two parts. This is generally invalid for ratio scores. 


(3) The Kuder-Richardson formula, which may be 
used only where the denominator of the ratio is a con- 
stant. 


(4) The Holzinger formula for the reliability of 
ratios, valid only if the variation of scores in the denomi- 
nator is small compared to the mean of the denominator 
for the group. A related formula for the standard error 
of measurement developed in this paper is valid under 
the same conditions. 


It was pointed out that ratio scores within the same 
277 





EDUCATIONAL AND PSYCHOLOGICAL MEASUREMENT 


population, and even ratios which are equal in size, may 
not have the same reliability. The standard error of 
measurement increases as the denominator decreases. It 
follows that a single reliability coefficient for a ratio score 
is not meaningful, except for data where the variation in 
the denominator is small compared to the denominator. 























GUIDING STUDENTS TO BECOME 
SELF-GUIDING 


JOSEPH S. KOPAS 
Fenn College 


O adjust oneself to modern living requires a degree 

of personal development that informal, hit-and-miss 
efforts of the individual do not supply. Too many people 
are frustrated and unhappy because their preparation for 
adjusting themselves to our complex society has been only 
incidental. In this day and age a person requires train- 
ing specifically directed at teaching him how to get the 
most out of life and the most out of himself. He must 
learn to appraise himself, to direct his personal develop- 
ment, to take advantage of opportunities for growth, and 
to evaluate his progress from time to time. The devel- 
opment of these skills should not be left to chance. They 
are far too important for that. An organized, formal 
program of training in self-guidance is needed. 


In recognition of this need, a guidance program has 
been developed over a period of years at Fenn College. 
The use of evaluative procedures is an integral part of 
the program. 


Fenn College utilizes the cooperative plan of educa- 
tion. The students are divided into two groups: one in 
class at the college, the other in full-time work off cam- 
pus. At the end of each three-month period they alter- 
nate. The cooperative work experience received by the 
students is an important factor in the guidance program. 


279 





mind: 


EDUCATIONAL AND PSYCHOLOGICAL MEASUREMENT 


f. 


Objectives of the Guidance Program 


At the time the guidance program was first consid- 
ered in 1931 as an organized activity of the college, it 
seemed logical that the following objectives be kept in 


That the guidance program be an integral part 
of the college program. 
The guidance program should perform so essen- 
tial a function that the contribution and progress 
of the guidance activities could best be judged by 
the progress made by the institution as a whole. 
That the guidance program be centered on the 
normal student. 
In too many cases, because of lack of time and 
personnel, only the maladjusted individuals in 
college get any real assistance from the guidance 
program and the normal student is, to a large 
extent, disregarded. By stressing the preventive 
as well as the adjustment phases of guidance 
work, and by developing techniques and methods 
which would be helpful to the normal students, 
it was hoped that the primary function of the 
guidance program would be to help the normal 
individual. 
That all faculty members participate. 
It was thought desirable to have every faculty 
member share in the program so that all students 
could be assisted properly. It was assumed that 
every faculty member could do some formal 
guidance work and that in doing his part he 
could, if given proper help, progressively qualify 
himself to do more and to do it better. Further- 
more, it was assumed that the counseling expe- 
riences could help him to become a more effect- 
ive teacher. Therefore, participation was ex- 
pected to be of personal value to the instructor 
who shared in the program. 


280 












































we AW Ao Av Av —_ | creed 


as Se S&S 


a ae oe a 











GUIDING STUDENTS TO BECOME SELF-GUIDING 


Organization of the Guidance Program 


During the past ten years, a guidance program was 
evolved which is in line with the above objectives. A list 
of the features of the program, with a brief description, 
follows: 


f. 


DO 


Yo 


The guidance program starts with the student. 

It helps him face as much of the responsibility 
for directing, motivating, and appraising the per- 
sonal development as is educationally desirable. 
It expects and requires progressively greater as- 
sumptions of that responsibility as the student 
becomes more experienced and more capable. 


Each instructor assumes a share of the respon- 
stbility in the guidance program as a general 
counselor. 

Each instructor acts as a general counselor for 
at least 10 students. The counselor is the insti- 
tution’s representative who assumes the respon- 
sibility of seeing that everything within the power 
of the college is done to help the student carry 
out his program of development to a successful 
conclusion. All formal guidance work, except in 
abnormal or unusual cases, is carried out through 
the counselor. 


A guidance specialist is provided to serve as a 
supervisor of the counselors. 

His responsibility is to organize the program, to 

select and develop techniques, and to provide 

leadership and direction to the guidance pro- 

gram. 


Each freshman student is enrolled in a group 

guidance class, called the Orientation Class. 
This feature will be described in detail later in 
this article. 


281 


EDUCATIONAL AND PSYCHOLOGICAL MEASUREMENT 


5. Faculty case board conferences are held at the 
end of each quarter. 

At these conferences the work of all students is 
reviewed and the progress and difficulties dis- 
cussed. This activity is a very effective part of 
the guidance program and provides very good in- 
service training media in guidance techniques 
and methods for counselors. 


6. Various useful data are collected and made avail- 

able to both the student and the counselor. 

These data include test results, records, and other 
vital information. 


A clinic is provided to deal with problem 
students. 

Specialists participate in this clinic. This feature 

is still in its early stages of development. 


The Record and Planning Folder 


Space does not allow a detailed description of each 
feature of the program. However, the Record and Plan- 
ning Folder and the Orientation Class, because of their 
uniqueness and importance in the guidance program, will 
be discussed here in greater detail. 

The common practice is for colleges to keep records 
of the student’s plans, achievements, and experiences. This 
practice takes care of the administrative needs, but does 
not give the student an opportunity to learn how to keep 
his own records. Planning requires that reliable infor- 
mation be gathered, organized, and used. For that rea- 
son it is important that the student learn how to keep 
records as a part of his training in self-guidance. 

Three years ago a group of students in an orientation 
class decided to do some pioneering work in the area of 
personal record keeping. The form developed was called 
the Record and Planning Folder. The students found 
the record very helpful and were quite enthusiastic about 
its value. The folder provided a convenient method of 


282 


N 

















a ca 
























GUIDING STUDENTS TO BECOME SELF-GUIDING 


gathering and organizing information necessary for effect- 
ive self-guidance. 

Most of the information used in the folder was already 
available but seldom organized by the students. The fol- 
lowing items were placed in the Record and Planning 
Folder on specially prepared forms during the freshman 


year: 


1. Personal history. 


This section includes personal data, such as date 
of birth, father’s and mother’s names, occupa- 
tions, nationality, names and ages of brothers and 
sisters in the family, and the student’s employ- 
ment experience prior to entrance in college. 
Autobiography. 

The autobiography is a report of approximately 
1,500 words containing the highlights of the stu- 
dent’s history. 

Summary of high school record and experiences. 
This summary includes all the subjects taken by 
the student and the grades, listed chronologically, 
as well as the student’s rank in class, honors and 
scholarship, special courses, extracurricular ac- 
tivities, and his appraisal of his high school ex- 
periences. 

Entrance test results. 

Each freshman undergoes a two-day testing pro- 
gram as part of the freshman week activities. The 
tests are of the type which the average instructor 
and student can understand and use. For that 
reason, the information is made available to both 
the student and the counselor. Areas of testing 
and the tests given are as follows: 


General Ability Tests...... A. C. E. Psychological Examination and 


Otis Mental Ability Test. 


General Background Tests. . Cooperative General Achievement Tests in 





Natural Science, Social Science, Mathe- 
matics, and English. 


283 












EDUCATIONAL AND PSYCHOLOGICAL MEASUREMENT 


Training in Special Subjects. lowa Placement Examinations in Mathe- 


matics and Chemistry Training. 


General Aptitudes......... Iowa Placement Examinations in Mathe- 


matics, English, and Chemistry Aptitudes. 


Vocational Interests........ Strong’s Vocational Interest Test, using the 


modified scoring. 


Special Skills............. Reading Test. 
Personal Characteristics....A battery of tests developed by the author. 


5. Report on the tentative plans for the school year. 





This report covers the general plans for personal 
development that have been worked out with the 
the student’s counselor. In addition to the aca- 
demic plans, they include plans for work experi- 
ence, extracurricular activities, social develop- 
ment, and community and religious activities. 
Scholastic record for the freshman year. 
Subjects taken and the grades received during 
each quarter, point averages and rank in class, 
and the student’s appraisal of the work done each 
quarter are included. 
Cooperative work experience. 
Freshmen normally start on their cooperative 
work experience at the end of the third quarter. 
Each student is required to write a report about 
this work experience. The highlights of this 
report, experience received, earnings, the employ- 
er’s evaluation of the student’s work, and the stu- 
dent’s appraisal of the work experience, are 
placed in this section. 
Record of unusual experiences and opportunities 
utilized. 
This record includes extracurricular and com- 
munity activities in which the student engaged, 
worthwhile social and religious experiences, as 
well as any unusual activities. 
Report on life philosophy. 
As a part of the Orientation Class activities 
the student states his philosophy in terms of a 


284 























' Ae == s 


Vet 


—“~ 

















GUIDING STUDENTS TO BECOME SELF-GUIDING 


pattern of beliefs. He inserts this in the Record 
and Planning Folder. 


10. Appraisal and evaluation of progress and expert- 
ence during school year. 
In this section the student records the highlights 
of the appraisal he and his counselor have made of 
his progress and difficulties, and modifications 
of plans made as the year progressed. 


Provisions in the Record and Planning Folder are 
made for the following information for each succeeding 
year: 

1. Addition to the autobiography. 

2. A report on tentative plans for each year. (Made 

prior to registration. ) 

3. Scholastic record. 

4. Cooperative work (work experience record). 

5. Record of unusual experiences and opportunities 

utilized. 

6. Appraisal and evaluation of progress during the 

school year. (Made at the end of each school 
year.) 


The Record and Planning Folder might very easily 
become one of the most significant features of the guid- 
ance program because it is helpful in so many different 
ways to the student himself and to the faculty members 
who deal with him. This coming school year every stu- 
dent will maintain the Record and Planning Folder as a 
part of his own guidance efforts. 


The Orientation Class 


The purpose of the Orientation Class is to help stu- 
dents intelligently plan and carry out a program of per- 
sonal development that will lead to successful adjustment 
in major areas of adult responsibilities. It is an activity 
which provides opportunity for formal training in the 
process of self-guidance. 


285 


EDUCATIONAL AND PSYCHOLOGICAL MEASUREMENT 


The students and instructors jointly and mutually 
assume responsibility for planning, organizing, conduct- 
ing, and evaluating the Orientation Class activities. Con- 
sideration of problems of personal adjustment and per- 
sonal development makes up the program. Three weeks 
are spent in planning the program, seven weeks in carry- 
ing it out, and one week in evaluating it. 

The general theme of the course is ‘““Learning to Live 
in Our Modern Complex Society.” The areas of adult 
activities which have been chosen to constitute the points 
of emphasis of the course are as follows: Learning to live 
with (1) one’s self, (2) others, (3) one’s job, (4) one’s 
government, (5) one’s estate, (6) one’s culture, and (7) 
one’s family. 

Each student as a member of a committee helps plan 
a program consisting of three one-hour sessions in one 
of the above seven areas that he chooses, and then helps 
conduct the class activities. At the end of the week, each 
student turns in a report which includes his objectives, 
his problems, and his plans. for personal development in 
the particular area under discussion. By the end of the 
quarter each student has written in detail a report con- 
taining seven sections stating how he intends to utilize 
the opportunities for personal growth to be found in col- 
lege, in his cooperative work, and in community activities. 


The advantages and importance of such a group guid- 
ance activity are readily seen. In the first place, the 
students are oriented into the major areas of adult life 
activities and responsibilities; in the second place, they 
are given a demonstration, through group thinking, of 
how a student, by means of choice and planning of activi- 
ties, learns to assume more responsibility, exercises a 
greater use of his intelligence, attains greater control of 
his behavior, and is able to evaluate his experiences more 
meaningfully, than the student who merely drifts with 
the current of events in college. Finally, in a friendly, 
informal atmosphere, a stimulating environment is pro- 


286 



























EE LET EROS 
sai A as 








GUIDING STUDENTS TO BECOME SELF-GUIDING 


vided for the student so that he gets a good start on his 
self-guidance program. 


Difficulties Encountered 


Difficulties that are encountered in the guidance pro- 
gram are common and familiar to all guidance workers. 

One of these difficulties is that of making the term 
“learning self-guidance” more concrete, both as to what 
is to be learned and how it is to be practiced. We have 
found that a practical approach to the difficulty is to limit 
the formal training the students are to receive in the area 
of self-guidance to the following three aspects: 

1. The development of a dynamic outlook on life— 
as a source of direction and motivation. 

2. The acquisition of a basic knowledge of the plan- 
ning process—as a means of organizing and direct- 
ing one’s efforts. 

3. The maintenance of a record—as a means of evalu- 
ating one’s progress and as a means of interpreting 
one’s efforts to others. 

Each year that the student is in college he has an 
opportunity to formulate and discuss with his counselor 
the objectives of personal development he wishes to 
achieve during the year and plans for achieving those 
objectives, as well as any modifications of his plans or 
objectives made during the year. At the end of the year, 
he and his counselor evaluate the progress made. If the 
student follows this procedure each year he is in college, 
he will have practiced self-guidance in a very effective 
and worthwhile way. ay 

Another difficulty faced is that of getting the instruc- 
tors, who are busy and often not too interested or qualified 
in guidance work, to put the necessary effort into the job. 
We have tried to minimize the “too busy” problem by 
(1) making the student more active in the program; (2) 
making the information about students quickly and easily 
available and usable; (3) distributing the task equally 
among all the instructors. 


287 











EDUCATIONAL AND PSYCHOLOGICAL MEASUREMENT 


The “not interested” problem was tackled by (1) ex- 
pecting and giving the instructors an opportunity to par- 
ticipate on the theory that participation stimulates inter- 
est; and, (2) getting them to see that the guidance pro- 
gram contributes to the improvement of their main func- 
tion, which is teaching. 

Finally, the “not qualified” problem was handled by 
(1) providing in-service training for the instructors; and, 
(2) simplifying the techniques and devices to a point 
where the average teacher can use them. 

Another difficulty is that of overcoming student indif- 
ference and the tendency to drift. Guidance implies mo- 
tion. Self-guidance, therefore, implies self-propelled mo- 
tion. It is absolutely essential in the guidance program 
that the student take the initiative in developing himself. 
The Orientation Class, the counseling system, and the 
Recording and Planning Folder help motivate the stu- 
dent to take the initiative, rather than to sit back and 
drift with the current of events. 

It is not possible to make an evaluation at this time 
of the complete effectiveness of the guidance program. 
A survey of the results up to this date would show that 
about 20 per cent of the students and 40 per cent of the 
faculty members are doing a good job of their respective 
parts of the program. Almost one-third of the faculty 
and one-fourth of the student body are not functioning 
very effectively. The remainder are doing just a fair 
job. Admittedly, progress has been slow. But the par- 
ticipants are becoming more and more interested as time 
goes on, and the program appears to be growing in ef- 
fectiveness. Progress should be more rapid within the 
next five years now that all the features described are 
included in the program. 


























AN ATTEMPT TO MEASURE SCIENTIFIC 
THINKING 


MAX D. ENGELHART AND HUGH B. LEWIS 
Chicago City Junior Colleges 


OSSIBLY the most challenging problem facing those 
TP cei in the construction and use of objective tests 
is the creation of exercises which will require the func- 
tioning of abilities transcending memory. The series of 
exercises presented in this paper may not deserve the label 
of a test of the ability to think scientifically. It seems 
justified, however, to present the series in the hope that 
the form of the exercises may suggest to other and more 
ingenious test makers improved means of measuring abil- 
ities which are among the universally recognized goals 
of science instruction. 

The series of exercises given here follows in its organ- 
ization the steps often regarded as the essence of the 
scientific method. One would be naive to believe that 
scientific problems are always solved in just these steps, 
or that the processes involved in their solution may not be 
more complex. On the other hand, the use of the stages 
represented in these exercises may be appropriate in test- 
ing students. Although the proper function of a test is 
measurement, it may still be legitimate to recognize the 
function of motivation, and exercises of the type classified 
may also accomplish the purpose of engendering in the 
minds of students a belief that knowledge of the scientific 
method is important. 

When the exercises were constructed, it was felt essen- 
tial to present certain introductory statements descriptive 
of the scientific method and of the phenomenon with 
which the problem to be solved is concerned. The distinc- 


289 








EDUCATIONAL AND PSYCHOLOGICAL MEASUREMENT 


tion between the directness and indirectness of the contri- 
bution of a datum in determining the truth or falsity of 
an hypothesis may be somewhat artificial. It is possible 
that the use of three categories rather than five would be 
justifiable. One might argue, however, that scientific 
thinking does involve this evaluation of relevancy of data, 
and that the more direct the contribution, the greater is 
the dependence which may be placed upon the data. 


When the content of the exercises was selected, an 
effort was made to present a phenomenon which in most 
respects would be novel to the students—that is, the phe- 
nomenon would be novel, but the concepts basically in- 
volved would relate to subject matter or principles with 
which the students had had some experience. The exer- 
cises on the operation of the radiometer were developed 
from a series of exercises of a somewhat different type 
which were written by Dr. C. E. Ronneberg of Herzl 
Junior College for the January, 1939, physical science 
comprehensive examination. It is possible to select other 
phenomena for which problems can be stated, and to con- 
struct similar exercises. There is, of course, no necessity 
to restrict such exercises to the field of physics. The par- 
ticular phenomenon and exercises presented here were not 
an altogether appropriate selection so far as the group 
tested was concerned, since the level of difficulty was 
too great. 

The series of exercises was included in a test adminis- 
tered to students entering the Chicago City Junior Col- 
leges who wished to enroll in the second, rather than in 
the first, semester of the physical science survey. The 
exercises and their introductory materials are reproduced 
below: 


A scientist, when confronted with a problem, formulates hypotheses 
which represent tentative solutions to the problem. He then collects 
data which may support or disprove his hypotheses. Finally, on the basis 
of the data and the hypotheses thus tested, he derives a conclusion which 
constitutes his answer to the problem. 


290 





































8 











is- 


in 
he 
ed 


SES 
cts 
asis 
ich 








AN ATTEMPT TO MEASURE SCIENTIFIC THINKING 


The following exercises represent an effort to test your ability to do 
scientific thinking. You are to test certain true or false hypotheses, and 
to evaluate certain general conclusions. Assume that each item of data 
below each hypothesis is a true statement and may directly or indirectly 
help to prove an hypothesis true or false. 


If the application of the item of data requires only one step to prove 
the truth or falsity of an hypothesis, then the item is a direct help. For 
example, the temperatures of water boiling on a given mountain and 
at sea level would represent direct evidence of the falsity of the hypothe- 
sis “water boils at a higher temperature on a mountain than at sea level.” 


If the application of the item of data requires more than one step to 
prove the truth or falsity of an hypothesis, then the item is an indirect 
help. For example, the item ‘water in a container that can be evacu- 
ated will boil at room temperature” indirectly helps to prove the falsity 
of the hypothesis “water boils at a higher temperature on a mountain 
than at sea level.” 


A number of years ago Sir William 

Crookes perfected an instrument 

which always intrigues people, 

whether laymen or scientists. This 

i cate Sas is the radiometer, a device consist- 

‘ud Face ing essentially of a paddle wheel 

which is free to rotate in a hori- 

dhitiie. ah tae zontal plane within a partially 

PADDLE WHEEL FROM ABOVE evacuated glass bulb. One side of 

each paddle is brightly polished, 

while the other side is coated with lampblack. As soon as the device 

is placed in the sunlight, the little paddle wheel starts to spin rapidly. 
It continues to spin until the device is again placed in the dark. 


PROBLEM: How does sunlight cause the paddle wheel to rotate? 


Below are given a series of hypotheses, each of which is followed by 
numbered items which represent data. After each item number on the 
answer sheet blacken space 


A if the item directly helps to prove the hypothesis true. 

B if the item indirectly helps to prove the hypothesis true. 

C if the item directly helps to prove the hypothesis false. 

D if the item indirectly helps to prove the hypothesis false. 

E if the item neither directly nor indirectly helps to prove the hy- 
pothesis true or false. 


291 


136. 


138. 
139. 


q 140. 


141. 


128. 
129. 
130. 
131. 


132. 
” 133. 


134. 


135. 


137. 


EDUCATIONAL AND PSYCHOLOGICAL MEASUREMENT 


HYPOTHESIS I: In a partial vacuum the paddle wheel rotates be- 
cause of the impact of photons of light. 


Scientists now believe that light has both corpuscular and wave 
characteristics. 


In a very high vacuum the bright faces of the paddle wheel turn 
slowly away from the light, while the black faces turn toward the 
light. 


Light travels at the rate of 186,000 miles per second. 


In a partial vacuum the black faces of the paddle wheel turn away 
from the light, while the bright faces turn toward the light. 


Light travels at a slower speed in glass than in air or in a vacuum. 


After this item number on the answer sheet blacken space 4 if 
Hypothesis I is true, or space B if it is false. 


HYPOTHESIS II: A paddle wheel on which all of the faces are 


bright or all are black will not rotate. 


The black faces of paddles absorb energy from light to a greater 
extent than the bright faces of paddles. 


Rotation is due to force of impact. If all paddles are the same on 
both sides, either all bright or all black, the turning forces would 
cancel. 


More photons rebound from bright faces than from dark faces. 


In a partial vacuum, air molecules are constantly hitting the 


paddles. 
Photons are hitting the sides of the paddles which face the light. 


After this item number on the answer sheet blacken space 4 if 
Hypothesis II is true, or space B if it is false. 


HYPOTHESIS III: Rotation in a partial vacuum of the paddle wheel 
is due to the greater force of rebound of air molecules from the 
black faces than from the bright ones. 


The bright faces remain cooler than the dark faces, since they 
reflect more light. 


In a partial vacuum and in the dark the paddle wheel will rotate 
when exposed to invisible infrared rays from a warm flatiron. 


292 





































143. 


146. 

ae. 
148. 

fT go, 

| 

¢ | 150. 

y 








144. 


145. 


AN ATTEMPT TO MEASURE SCIENTIFIC THINKING 


The black faces of the paddles become warmer than the bright 
faces, since they absorb more light. 


Air molecules adjacent to the warmer black faces rebound from 
these faces with greater energy than from the cooler bright faces. 


In a very high vacuum and in the dark the paddle wheel will rotate 
slowly if invisible rays from a cathode tube are directed toward it. 


After this item number on the answer sheet blacken space 4 if 
Hypothesis III is true, or space B if it is false. 


Below are five conclusions. After each corresponding number on the 


answer sheet blacken space 

A if in your judgment the conclusion is the best answer to the 
problem. 

B if in your judgment the conclusion is neither the best answer nor 
the least satisfactory answer to the problem. (Three conclu- 
sions should receive this mark. ) 

C if in your judgment the conclusion is the least satisfactory 
answer to the problem. 


The paddle wheel of the radiometer rotates, because air molecules 
move with greater energy when heated by energy from sunlight or 
from infrared rays from a flatiron. 

Air molecules rebound with greater force from the bright faces, 
which reflect more light energy. Photons rebound from dark 
faces to a greater extent than from bright faces. The turning 
forces thus created cause black faces to rotate toward the light 
in a partial vacuum and away from the light in a very high 
vacuum. 


The paddle wheel of the radiometer rotates, because photons of 
light strike air molecules with greater energy when adjacent to 
the dark faces than when adjacent to the bright faces. 


The fact that a radiometer will operate in either a partial or a 
very high vacuum demonstrates that it is not essential that air 
molecules be present in order to cause rotation. 


Air molecules rebound with greater force from the black faces, 
which absorb more light energy than the bright faces. Photons 
rebound from bright faces to a greater extent than from dark faces. 
The turning forces thus created cause black faces to rotate away 
from the light in a partial vacuum and toward the light in a very 
high vacuum. 


293 








EDUCATIONAL AND PSYCHOLOGICAL MEASUREMENT 


In the table below are presented the proportions of 
correct response and correlations of each of the items with 
the total score on the test and with the total score on the 
part, i.e., the total score on the 23 scientific thinking exer- 
cises. These data are based on an analysis of the answer 
sheets of a random sample of 200 cases. The selection 


TABLE 1 
ANALYSIS OF ITEMS 








Item-Test Correlation 
Correlation withCorrelation with 








Item No. Key Item Difficulty Total Scoreon Total Score 
Entrance Test on Part 
128 E Be i 25 35 
129 D 09 15 20 
130 E 55 58 61 
131 i 18 a5 27 
132 E 42 37 $5 
133 B 21 18 30 
134 E 12 19 32 
135 A -60 41 41 
136 E 18 15 33 
137 E .28 40 55 
138 B .23 .28 -40 
139 A 65 40 38 
140 B ae .00 27 
141 E 18 27 27 
142 B 20 .08 36 
143 A 41 28 33 
144 E 24 41 48 
145 A 33 09 29 
146 B 41 38 38 
147 iS 15 20 20 
148 B 50 35 45 
149 B 39 09 19 
150 A 39 19 17 





was made from the answer sheets of all of the students 
taking the test on entrance into the junior colleges. The 
reliability of the series was found to be .72 by means of 
a Kuder-Richardson formula. The series of exercises 
correlated .64 with the total score on the entrance test. 
Seventy-five of the other exercises were factual, multiple- 
answer items dealing with high-school physics and chem- 
istry, and 57 were true-false items pertaining to several 
passages selected from advanced texts in the physical 
science field. These latter exercises emphasized aptitude 
more than training in that they were essentially a reading 
test in the field of physical science. 


294 











wee ae ee 





AN EVALUATION OF TECHNIQUES OF MEASUR- 
ING VISUAL ACUITY AT THE COLLEGE LEVEL 


FRANCES ORALIND TRIGGS 
University of Minnesota 


KARL E. SANDT, M.D. 
University of Minnesota Health Service 


HE UNIVERSITY of Minnesota Health Service and 

the University Testing Bureau have been cooperating on 
an evaluation of the Betts Ophthalmic Telebinocular Test to 
determine whether it is a valid screening test of visual acuity 
for use with college students. 

The problem of determining what students should and 
what students should not be referred to an eye specialist is a 
real one because many health services do not have such doctors 
on their staffs and students, many of whom cannot afford it, 
must pay for such service individually. If there is an eye spe- 
cialist on the staff of the health service, it is difficult for him 
to give each student individual attention. The students who 
come voluntarily to him to be examined may be the very ones 
who do not need an examination, and those who do need it 
may never get it, for often a student himself does not know 
when his eyes need attention. sip 

The plan of the research was this: The Betts Telebinoc- 
ular Examination was included as a part of the diagnostic 
reading test battery which is given by the University Testing 
Bureau in cases of suspected reading difficulties. At the time 
the student took the Telebinocular Test, he was given a note 
addressed to Dr. K. E. Sandt at the Health Service asking for 
a complete eye examination. This note was an indication to Dr. 
Sandt that the student was to be included in the research. It 


295 











EDUCATIONAL AND PSYCHOLOGICAL MEASUREMENT 


was intended to have complete data from these tests on 100 
students, but when the data from the records on the Telebinoc- 
ular and from the Health Service records were tabulated, it 
was found that data were complete for only 87 students. The 
measures on which data were available from both sources 
were: visual acuity in the right and left eye separately, 
exophoria and esophoria near and far, and hyperphoria near 
and far. If a student wore glasses he was supposed to be 
tested with and without glasses. If the student wore glasses to 
the examination at the Health Service and the examination at 
the University Testing Bureau, the data with glasses were 
always used. If data were available only from one examination 
without glasses, the data from both services without glasses 
were used.* 

In evaluating these data, certain facts should be kept in 
mind. The Betts Telebinocular’ purports to screen out students 
having measurable visual defects serious enough to be con- 
sidered for correction by an opthalmologist. In a letter to the 
University Testing Bureau from the Bureau of Research of 
the Keystone View Company dated April 22, 1940, the follow- 
ing bases for referral were given: “‘(a) You need have no hesi- 
tation about referring a patient who fails on any part of Test 
3, provided that if there has been a failure on B or C with 
both eyes open you make the test by occluding the eye not 
being tested. If there is still a failure the patient should be 
referred. (b) Test 4 is seldom failed but if it is failed, with- 
out question the patient needs attention. (c) The failure 
of Test 5 alone is not a warrant for referring the patient, 
but if there have been other failures, particularly in Test 2 
and 6A, there is no question but what there is poor eye co- 





1]t should be remembered when interpreting these data that there is no 
indication as to whether or not this group of students is a selected sample of the 
whole student body as far as visual acuity is concerned. There is no reason to 
believe that they would be a selected sample on the criteria of visual acuity 
just because they are on reading skills, for it has been shown that there is no 
consistent relationship between these two factors. 

2For complete description of the instrument, see Emmett Albert Betts, The 
Prevention and Correction of Reading Difficulties (Evanston, Illinois: Row, 
Peterson and Company, 1936), pp. 327-50. 


296 




















TECHNIQUES OF MEASURING VISUAL ACUITY 


ordination. (d) With high school and university students 
who complain of discomfort in reading, failures of Test 6B 
and 7 taken together are indicative of near point trouble, 


” 


and they should have attention.’”’ It is upon these bases that 
referral was determined by the University Testing Bureau for 
this study. The ophthalmologist is trained to determine visual 
defects and decide whether they are serious enough for correc- 
tion; thus it may be seen that there is some overlap of service 
but no overlap of responsibility, the final decision always lying 
with the ophthalmologist as to whether correction shall be 
given. In the light of these stated purposes of the Betts test, 
it would seem that the following questions might be helpfully 
answered: 


1. Will the Betts test screen out for referral to the oculist 
a large number of students in whom the oculist will find 
deficiencies serious enough for correction? 


2. Will the Betts Telebinocular refer a large proportion 
of students whom the oculist finds to have no measurable eye 
difficulty? 


3. By comparing the oculist’s and Betts’ records, on 
individual tests, are all of the Betts measures equally satis- 
factory? 


4. On what tests of the Betts Telebinocular are referrals 
made most often? Can a better basis of referral on the Betts 
Telebinocular be found than the one furnished us by the Key- 
stone View Company? 


The question which always arises in research of this kind 
is whether the tests measure what they purport to measure and 
whether, if they were administered a second time, they would 
give the same results as they did the first time. These ques- 
tions have never been finally answered for either of the tests 
used in this study. It may be that they never can be answered, 
for in both cases the results are dependent upon the physiolog- 
ical status of the individual being tested. The relationship 
of the effects of fatigue, light, and other factors upon different 
individuals may vary so greatly that a constant score may not 


297 








EDUCATIONAL AND PSYCHOLOGICAL MEASUREMENT 


be possible. Or it may even be that there are still to be dis- 
covered better ways of diagnosing visual anomalies. 


In this study, the extent to which the two tests agree on 
diagnosis is indicated, but neither test is assumed to give a 
perfect diagnosis. However, because prescriptions are finally 
made by the oculist, the extent to which the Betts would 
refer to the oculist those people found by him to need correc- 
tion is pointed out. 


The following evaluation of data is presented in answer 
to these four questions: 


1. Of the 87 students included in this study, 13 were given 
glasses by the oculist, 11 were given prescriptions to correct 
measurable physical eye defects, and two students were given 
glasses merely to improve comfort while reading rather than 
to correct measurable eye defects. 


Of the 11 students given prescriptions to correct measur- 
able physical eye defects, all would have been referred on 
the criteria of referral sent us by the Keystone View Company. 
The remaining two students would not have been referred on 
the basis of these criteria. Thus it will be seen that all students 
found by the oculist to have measurable physical eye defects 
would have been referred for complete examinations as a 
result of the Telebinocular Test. 


2. Of the 87 cases which had both the Betts test and an 
ophthalmic examination, 46 would have been referred by the 
Betts test to the oculist on the basis of the criteria furnished 
us by the Keystone View Company. Of the 46 students who 
would have been referred by the Telebinocular Test, only 11, 
or 24 per cent, had defects serious enough to be corrected 
by glasses. However, it should be remembered that while 
only about 53 per cent of the group would have been referred 
to the oculist for complete testing, 100 per cent would have 
had to be tested, had no pre-test been given. While it might 
be desirable to have a more rigorous screening test, it is cer- 
tainly worth-while to save the oculist from having to examine 
almost half of the students. 


298 














 _ 


Co ete Ot 








TECHNIQUES OF MEASURING VISUAL ACUITY 


3. The data which bear on this question are presented 
in Table 1. Data were complete from both sources for 87 
students on visual acuity for the right and left eyes. For the 
right eye, we find that 20 students, or 23 per cent, failed the 
Betts test and the oculist’s test; 48, or 55 per cent, passed 
both tests. In other words, it was found that the oculist’s 
diagnosis and the Betts’ diagnosis agreed on 78 per cent of 
the cases. Six students, or 7 per cent, failed the Betts test but 
were found satisfactory by the oculist; and 13, or 15 per cent, 
passed the Betts test but failed the oculist’s test. It should be 
remembered that of these 13 who passed the Betts test but 
failed the oculist’s test, the defect found by the oculist was 
in no case considered serious enough for correction. 




















TABLE 1 
COMPARISON OF RESULTS: BETTS TESTS AND OCULIST’S TESTS FOR EIGHTY-SEVEN 
SUBJECTS 
Failed Betts and Oculist’s Passed Betts and Passed Oculist’s 
Right Eye Left Eye Right Eye Left Eye 
No. %o No. %o No. %o No. % 
20 23 13 15 48 55 54 62 
Failed Betts and Passed Oculist’s Passed Betts and Failed Oculist’s 
Right Eye Left Eye Right Eye Left Eye 
No. %o No. % No. % No. % 
6 7 10 11.5 13 15 10 11.5 


For the left eye, data were again complete for 87 cases. 
We find that 13 students, or 15 per cent, failed the Betts 
test and the oculist’s test; 54, or 62 per cent, passed both 
tests. Thus the oculist’s and the Betts’ diagnosis agreed on 
77 per cent of the cases. Ten students, or 12 per cent, failed 
the Betts test but were found satisfactory..by the oculist; and 
10, or 11 per cent, passed the Betts test but failed the oculist’s 
test. It should be remembered that of those 10 who passed 
the Betts but failed the oculist’s tests, the defect found by 
the oculist was in no case considered serious enough for 
correction. 

On the measure of vertical imbalance (hyperphoria) far 
point on the Betts test, none of the 72 cases on which data 
are complete for both measures failed the test. (Two stu- 


299 








EDUCATIONAL AND PSYCHOLOGICAL MEASUREMENT 


dents who were included in the study, but for whom no oculist 
measure of vertical imbalance far point was available, failed 
this test.) The oculist found that 10 students, or 14 per cent, 
of the 72 students for whom data were complete on both meas- 
ures had vertical imbalance and 62, or 86 per cent, did not. 


The Betts test for vertical imbalance near point did not 
identify any member of the group with this difficulty, but the 
oculist found that seven, or 10 per cent, of the 72 did have 
vertical imbalance and 65, or 90 per cent, did not have vertical 
imbalance near point. These data would seem to indicate that 
this test is of questionable value for use with college students. 

On the lateral imbalance near point, data are complete 
for 72 cases. Of the 72 cases, four students, or six per cent, 
failed the Betts and the oculist’s tests; two, or three per cent, 
failed the Betts and passed the oculist’s test; 18, or 25 per 
cent, passed the Betts and failed the oculist’s test; and 48, or 
66 per cent, passed the oculist’s test and the Betts test. Thus 
it will be seen that the two measures agreed in 72 per cent 
of the cases. 

For lateral imbalance far point, no students of the 72 who 
were found to be unsatisfactory by the oculist failed the Betts; 
but two students, or three per cent, failed the Betts who were 
found to be satisfactory by the oculist; four, or five per cent, 
passed the Betts test who were found to be unsatisfactory 
by the oculist; and 66, or 92 per cent, were found to be satis- 
factory on both measures. For lateral imbalance far point, 
the two measures agreed in 92 per cent of the cases.’ 

This evaluation of tests would lead us to say that, of the 
parts of the Betts tests studied, the one found to be most 
valuable for referral of college students for a complete eye 
examination is the one of visual acuity for the right and left 
eye. There is not complete agreement between this test and 
the oculist’s measurements, but it does agree 78 out of 87 
times for the right eye and 77 out of 87 times for the left eye, 
and where it does not agree the oculist found no situation to 
exist which was serious enough for correction. This finding 
raises the question as to whether another measure of visual 


300 




















TECHNIQUES OF MEASURING VISUAL ACUITY 


acuity not requiring expensive apparatus would serve as satis- 
factorily. Only one other measure of visual acuity was given 
these students. When students enter the University of Minne- 
sota they are required to have a physical examination at the 
Health Service. As a part of that examination the eyes are 
checked by use of the Snellen Chart.* The record of these 
examinations was on file at the Health Service and has been 
tabulated for consideration here. 

There was a record of a Snellen examination for all the 
87 students included in this study. On the basis of that exam- 
ination eight students would have been referred to the oculist 
for complete examination. For this study it is important to 
determine whether these are the same students referred by the 
Betts test and also to ascertain in how many cases the students 
referred were found by the oculist to have a defect serious 
enough to warrant a prescription. 

Of the eight students referred by the Snellen test, six 
would also have been referred by the Betts. As has been 
stated, 13 of the 87 students were found by the oculist to 
have defects serious enough to be corrected by glasses. Of the 
13 given glasses, the Snellen Chart examination would have 
referred only two to the oculist. These data would seem to 
indicate that on measures of visual acuity the Betts test is 
superior to the Snellen Chart in referring students with actual 
difficulty to the oculist for complete examination. 

4. It will be remembered that the Keystone View Com- 
pany gave four standards for referring a student to the oculist 
on the basis of the Betts test. These were: 

(a) Failure on test 3 or any part of test 3 (visual acuity). 

(b) Failure on test 4 (vertical imbalance). 

(c) Failure on test 5 (coordination) with failure on test 

2 (distance fusion) and failure on test 6A (lateral 
imbalance far point). 

(d) A complaint of discomfort in reading with test 6B 

(lateral imbalance near point) and test 7 (fusion at 
reading distance). 





8Tbid., pp. 149-51. 








EDUCATIONAL AND PSYCHOLOGICAL MEASUREMENT 


Of the 46 students who would have been referred to the 
oculist on these standards of referral for the Betts test, 43 are 
identifiable by their failure on the first criterion, i.e., failure on 
one or more of the visual acuity tests. Two of these students 
also failed the vertical imbalance far point test. Two of the 
students who would have been referred failed on both criteria 
one and four. Two failed criterion four only and would 
have been referred on it alone. One student failed and would 
have been referred on criterion three alone. 

Thus, examination of our data would indicate that referral 
on the basis of criteria one, three, and four and the record of a 
complaint of discomfiture would have referred all students 
given prescriptions by the oculist to correct measurable phys- 
ical eye defects. The number referred would have been 46 
and it would have included the same students referred by the 
four criteria given by the Keystone View Company. It is 
questionable whether the test of vertical imbalance adds any- 
thing essential to this series when used as a screening device 
at the college level. 

On the basis of the data just presented it does seem that 
the Betts Ophthalmic Telebinocular Test can be used satis- 
factorily by colleges as a screening device for referral of cases 
to the oculist. The Betts test also stands up more satisfactor- 
ily than does the Snellen Chart examination (when given 
under the conditions described) as a measure of visual acuity. 

These conclusions should be checked by repeating this study 
on another group of students, and they should be accepted 
only tentatively for situations other than those described in 
this study. 




















THE CONCEPT OF SCATTER IN THE LIGHT OF 
MENTAL TEST THEORY’ 


MAURICE LORR 


U. S. Civil Service Commission 


RALPH K. MEISTER 
Mooseheart Laboratory for Child Research 


HE CONFUSION and loose thinking among clinical 

psychologists concerning the basis and significance of 
scatter on scales of the Binet type suggest a re-examination 
of the concept of scatter in the light of the theory of psycho- 
logical measurement. 

Theoretically, on mental age scales of the Binet type, items 
are arranged in the order of their difficulty, the easiest item 
first. In clinical practice these items are administered to a 
child in the same sequence. Groups of items, supposedly equal 
in difficulty, are allocated to each year level as representing 
the typical performance of individuals of the corresponding 
chronological age. The child is given increasingly difficult 
items until he reaches a point in the scale above which he fails. 
Actually, no such point exists. Instead, the child passes all 
tests at a certain level and continues with mixed successes and 
failures on to the next higher level until he fails all items pre- 
sented to him in a given level. Such a spread of successes and 
failures over a number of mental year levels is called scatter. 

Test theory indicates five possible bases for such irregu- 
larity of performance. First, scatter is a consequence of the 
lack of perfect correlation between test items resulting from 
the presence of error and from the low communality and 





1The authors wish to thank Dr. M. W. Richardson for his review of this 
article and Dr. M. L. Reymert for his interest and encouragement throughout. 


303 








EDUCATIONAL AND PSYCHOLOGICAL MEASUREMENT 


high specificity of the items. The error, as Mosier (10) has 
shown, increases in the individual case with greater hetero- 
geneity of items. Thus, an individual who passes one item at 
a given level may not necessarily pass a second item, either 
because the two items do not measure the same function or 
because of error involved in testing. Illustrative of this lack 
of perfect correlation between test items is the Cattell and 
Bristol study (4) in which a mean intercorrelation of + .32 
was found for seven Binet test items; Wright (15) found the 
mean intercorrelation on 31 items to be + .61. 


Secondly, there is the fact that the items are incorrectly 
allocated in the order of difficulty. This might be expected 
in view of the fact that Terman and Merrill (12), for exam- 
ple, although using curves of proportions-passing for each 
item in the preliminary grouping of items, had as their goal 
in the final grouping an I.Q. distribution with a mean of 
approximately 100. It is likely that such a procedure re- 
sulted in a grouping of items only roughly ordered as to 
dificulty. Therefore, although the grouping of test items is 
approximately in the order of their difficulty in the sense that 
an item at age four is definitely less difficult than one at age 
10, nevertheless, in adjacent groups there are probably a great 
many inversions in difficulty. Thurstone’s study (14) on the 
absolute scaling of Binet items shows that items at any particu- 
lar age level vary considerably in difficulty, a finding contrary 
to the assumption of relatively equal difficulty among items 
at any one age level. He found, too, inversions in placement 
according to difficulty. Thus an item which is easier than 
another may be placed at a higher age level. In fact, alloca- 
tion at the different age levels-on the basis of difficulty is 
improbable since the items distribute very unevenly in abso- 
lute difficulty over the age levels. On the basis of Burt’s 
data, Thurstone (14) says, “The test questions are more 
numerous at certain ages than at others. For example, there 
are 12 questions that scale at par between the ages five and 
six, whereas there are only four questions that scale at par 
between six and seven.” Any kind of arrangement that requires 


304 














oo = 


a i i 








SCATTER IN MENTAL TEST THEORY 


the same number of tests at each age level is unlikely to result 
in equal gradations of difficulty since the test items used 
do not scale into any such grouping. 

This fact of the incorrect allocation of test items at the 
various age levels is brought out by the findings of many 
other investigators. Cyril Burt (2) admits that no two 
editors agree about the correct order of mental age items 
and cites instances. Barber (1), on data for the revised 
Form L, found that five items were significantly easier and 
six items were significantly more difficult than their respective 
age placements would indicate. Likewise, Harriman (5) 
found (for the Revised Scale) that test items at year level 
XII seem to be more difficult than those at year level XIII, 
a fact which is confirmed by Carlton (3). Krugman (8) 
found that for New York school children, 25 of the items 
were incorrectly allocated. 

Thirdly, scatter may be due to the lack of discriminatory 
power of certain items. A highly diagnostic item will dis- 
criminate sharply between individuals with ability above that 
required to respond correctly to the item and those individuals 
who lack such ability. For example, two items may be equal 
in difficulty (50 per cent pass at, say, age 11), but differ 
widely in diagnostic value. The psychometric curve for one 
item may extend over, say, years five to 14. The curve for 
another item that is more discriminatory will extend over a 
much smaller age range, such as eight to 12. This spread 
or scatter is manifestly a result of low diagnostic value. 
Thurstone (14) plotted curves of proportions-passing for a 
random selection of items from the Burt-Binet and found a 
“noticeable variation in the slopes of curves.” It is probable, 
therefore, that some of the scatter found is due to these dif- 
ferences in the diagnostic value of the items which these 
examples illustrate. 

A fourth source of scatter may be found in the fact that 
there is an increase in variability with an increase in absolute 
mean test performance (13). In other words, individuals 
apparently become more variable as they grow older. Since 


305 








EDUCATIONAL AND PSYCHOLOGICAL MEASUREMENT 


individuals vary more among themselves, they must vary as to 
the number and types of items failed or passed. This can be 
easily seen, since the extent to which Thurstone’s “primary 
mental abilities,” for example, are present varies between 
individuals and within individuals, and the factorial composi- 
tion of test items differs from item to item within the same 
age level. Thus an individual who passes one item at a certain 
age level may not pass another because the latter requires an 
ability which he does not have to the required degree. Again, 
an individual’s failure on five items at a certain level is no 
sure indication that he will fail the sixth, since the sixth item 
may require an ability which he has to a marked degree. This 
tendency of scatter to increase with mean test performance or 
chronological age is checked in actual practice between the 
ages 10 and 12 (Reymert and Meister [11]) for within that 
age range the ceiling of the test begins to limit the amount 
of scatter possible. 


A fifth possible cause of scatter is the presence of sys- 
tematic errors in testing due to language handicaps, sensory 
defects, special training, lack of cooperation, and ambiguous 
scoring or instructions. Unlike chance errors which influence 
the results as often in one direction as in another and there- 
fore can be assumed to cancel out one another, systematic 
errors have a consistent and cumulative effect that gives the 
results a constant bias. Obviously if a test shows constant 
bias for a given individual, it is unsuited for that individual, 
i.e., the individual differs sufficiently from the norm popula- 
tion to render the test inapplicable. If such a test is given, 
the person with language difficulty will tend to fail highly 
verbal items, the uncooperative individual will answer only 
those questions which he can be motivated to try, the individual 
with a slight hearing loss may miss the critical part of the 
question, etc. All of these factors tend to lower the basal 
level which per se gives a larger amount of scatter. 


The uses to which measures of scatter have been put can 
now be critically examined in the light of the sources of scat- 
ter given above. Perhaps the first point that should receive 


306 























SCATTER IN MENTAL TEST THEORY 


critical attention is that of the methods of measuring scatter, 
for there are a great many such methods and the amount 
of agreement among them seems to be inversely related to 
their number. Theoretically, if scatter represents the range 
of uncertainty of an individual’s ability, the range within 
which lies the limen of his ability or the point beyond which 
he will fail all the items—a point which in actual testing 
practice does not exist—then scatter should be measured on 
a scale of absolute difficulty, as the distance between the 
easiest item failed and the most difficult one passed. Actually, 
as no such measure has ever been used, it is not surprising 
that, to quote Harris and Shakow (6), “research up to now 
has failed to demonstrate clearly any valid clinical use for 
such measures.” These authors mention nine methods of 
measuring scatter and conclude that “at the present time it 
is impossible to state which is the best method of measuring 
scatter.” 


Now, in view of the uncertainty about measures of scatter, 
the uses to which they have been put become all the more 
questionable. It is common practice to use measures of scatter 
as indicative of epilepsy, psychosis, feeblemindedness, emo- 
tional maladjustment, hypopituitarism, etc. Harris and Sha- 
kow’s paper (6), in fact, deals with “the possibility of obtain- 
ing clinically significant information from numerical measures 
of scatter.” 

Studies which have indicated significant differences in the 
mean scatter for certain groups do not justify diagnosis of 
a particular condition in the individual case. And when one 
considers that for five papers that report such differences, 
there are four that do not (6), even differentiation for groups 
appears questionable. For indicating the type of condition in 
question, there certainly should be a more refined instrument 
than scatter on a test designed to measure intelligence. 

From a consideration of the various bases for scatter, 
it is obvious that their influence is certainly greater than that 
of any chance errors. Therefore, in the light of these con- 
siderations, the use of scatter as even a crude estimate of the 


307 


EDUCATIONAL AND PSYCHOLOGICAL MEASUREMENT 


measurement error of the test for an individual is not an 
acceptable procedure. 

It is also common practice among clinical psychologists 
to analyze the particular successes and failures of an individual 
to give a crude appraisal of his primary abilities as well as 
estimates of mental deterioration in these abilities. Such 
inspectional analysis is a questionable practice for the fol- 
lowing reasons: First, the factorial composition of an item 
cannot be prejudged accurately. Such judgments are frequently 
in complete disagreement with factor analysis results. For 
instance, Wright (15) found that items involving repeating 
digits backwards did not necessarily involve memory ability 
but rather other factors. And yet how many failures on such 
items have been analyzed in reports as poor memory ability? 
Secondly, an item may be solved through the use of different 
abilities by different individuals and at different age levels. 
Thirdly, items may show fairly high loadings on more than 
one factor so that failure cannot be attributed to the lack 
of any one ability. Fourthly, such clusters of items have too 
low a reliability to have any real diagnostic value. 

In summary, it is concluded that scatter is for the most 
part due to factors inherent in test construction plus certain 
systematic errors. In view of these facts, it appears that 
the possibility of ever securing clinically significant information 
from measures of scatter based on age scales in current use 
is slight indeed. 4 


\ 



























10. 


11. 


12. 


SCATTER IN MENTAL TEST THEORY 


REFERENCES 


. Barber, E. R. “A Study of Scatter and the Relative Diffi- 


culty of Sub-Tests in the Revised Stanford-Binet,” Master’s 
thesis, University of Illinois, 1938. 


Burt, C. “The Latest Revision of the Binet Intelligence Test,” 
Eugenics Review, XXX: 4 (1934), 255-60. 


Carlton, Theodore. “Performances of Mental Defectives on 
the Revised Stanford-Binet, Form L,” Journal of Consulting 
Psychology, IV (1940), 61-5. 


Cattell, R. B. and Bristol, H. “Intelligence Tests for Mental 
Ages Four to Eight Years,” British Journal of Educational 
Psychology, I11:2 (1933), 142-69. 


Harriman, P. L. “Irregularity of Successes on the 1937 
Stanford Revision,” Journal of Consulting Psychology, III 
(1939), 83-6. 


Harris, A. J. and Shakow, D. “The Clinical Significance of 
Numerical Measures of Scatter on the Stanford-Binet,” Psy- 


chological Bulletin, XXXIV (1937), 134-50. 


Harris, A. J. and Shakow, D. “Scatter on Schizophrenic, 
Normal and Delinquent Adults,” Journal of Abnormal and 
Social Psychology, XX XIII (1938), 100-11. 


Krugman, Morris. “Some Impressions of the Revised Stan- 
ford-Binet Scale,” Journal of Educational Psychology, XXX 
(1939), 594-603. 


Mateer, Florence. ‘Differential Syndromes in Stanford- 
Binet Failures” (Abstract), Psychological Bulletin, XXXVI 
(1937), 508. 


Mosier, Charles I. “Psychophysics and Mental Test Theory: 
Fundamental Postulates and Elementary Theorems,” Psy- 
chological Review, XLVII (1940), 355-66. 


Reymert, Martin L. and Meister, Ralph K. “A Comparison 
of the Original and the Revised Stanford-Binet Intelligence 
Scales,” Educational and Psychological Measurement, I 


(1941), 67-76. 


Terman, Lewis M. and Merrill, Maud A. Measuring Intel- 
ligence. Boston: Houghton-Mifflin, 1937. Pp. 461. 


309 


EDUCATIONAL AND PSYCHOLOGICAL MEASUREMENT 


13. 


14. 


Thurstone, L. L. “The Absolute Zero in Intelligence Meas- 
urement,” Psychological Review, XXXV (1928), 175-97. 


Thurstone, L. L. “A Method of Scaling Psychological and 
Educational Tests,” Journal of Educational Psychology, XVI 
(1925), 433-51. 


. Wright, R. E. “A Factor Analysis of the Original Stanford- 
Binet,” Psychometrika, IV (1939), 209-20. 

















MEASUREMENT ABSTRACTS* 


Adkins, Dorothy C. “The Relation of Primary Mental 
Abilities to Preference Scales and to Vocational Choice.” 
Psychometrika, V (1940), 316. (Abstract of a paper 
read at the September, 1940, meeting of the American 
Psychological Association. ) 





Benge, Eugene J. ‘‘Wanted: More Logic and Less Guess- 
work in Hiring Salesmen.” Sales Management, XLVIII, 
No. 3 (1941), 18-20. 


Companies who have records for many employees, past 
and present, could profitably make an analysis of job require- 
ments. It is suggested that a criterion of job efficiency be set 
up and employees classified accordingly. An outline, based on 
these data, for constructing a rating scale in terms of factors 
which can be elicited at time of application is presented. Mini- 
mum scores on the rating scale are to be established accord- 
ing to scores made by employees. It is emphasized that scores 
on such a rating scale should not be considered alone but only 
in connection with other sources of information in hiring 
employees. D. A. Peterson. 





Blackwell, A. M. “A Comparative Investigation Into the 
Factors Involved in Mathematical Ability of Boys and 
Girls.” British Journal of Educational Psychology, X 
(1940), Pt. I, 143-53, and Pt. II, 212-22. 


A group of 100 boys and a group of 100 girls, ages 
ranging from 13% to 15 years, were given 10 tests of “mathe- 
matical ability” including arithmetical reasoning, analogies, 
three ‘‘spatial tests,” three “geometric tests,” and a test of 
algebraic computation and reasoning. The intercorrelations 
for each sex were factored separately. Interpretations were 
attempted on the basis of the centroid matrices and on the 





*Edited by Professor Forrest A. Kingsbury. 


311 








EDUCATIONAL AND PSYCHOLOGICAL MEASUREMENT 


basis of an orthogonal rotation as designed to insure a general 
factor. Sex differences were found. “The results of the study 
seem to confirm the complex nature of mathematical abil- 
ity...” Harold Bechtoldt. 





Coombs, Clyde H. “A Factorial Study of Number Ability.” 
Psychometrika, VI (1941), 161-89. 


In order to investigate certain hypotheses concerning the 
nature of number ability, and, secondarily, the nature of per- 
ceptual speed, a battery of 34 tests was given to 223 Chicago 
high school seniors and the data were factored by the centroid 
method. Seven primary factors were identifiable upon rota- 
tion. Several deductions are made relative to the interpreta- 
tion of the factors and relative to the consistency of the data 
with the hypotheses which were to be tested. (Courtesy 
Psychometrika.) 


Coombs, Clyde H. “A Criterion for the Number of Factors 
in a Table of Intercorrelations.” Psychometrika, V 
(1940), 315. (Abstract of a paper read at the Sep- 
tember, 1940, meeting of the American Psychological 
Association. ) 








Cureton, E. E. “Testing in College Personnel Service.” 
Journal of Consulting Psychology, IV (1940), 221-24. 


A survey of the purposes of a college personnel service 
and of the extent to which available tests lend themselves to 
such purposes. The need for an adequately standardized dif- 
ferential intelligence battery is emphasized. To show progress 
and to give information concerning the pattern of abilities 
and attainments, test scores should be directly comparable. 
Suggestions are made on coordinating test production. W. 
A. Varvel. 





Dwyer, P. S. “The Solution of Simultaneous Equations.” 
Psychometrika, V1 (1941), 101-29. 


This paper is an attempt to integrate the various methods 
which have been developed for the numerical solution of 


312 















o 








MEASUREMENT ABSTRACTS 


simultaneous linear equations. It is demonstrated that many 
of the common methods, including the Doolittle method, are 
variations of the method of “single division.’””’ The most use- 
ful variation of this method, in case symmetry is present, 
appears to be the Abbreviated Doolittle method. The method 
of multiplication and subtraction likewise can be abbreviated 
in various ways of which the most satisfactory form appears 
to be the new Compact method. These methods are then 
applied to such problems as the solution of related equations, 
the solution of groups of equations, and the evaluation of the 
inverse of a matrix. (Courtesy Psychometrika.) 





Dwyer, P. S. “The Evaluation of Determinants.” Psycho- 
metrika, VI (1941), 191-204. 


The numerical evaluation of determinants with a modern 
computing machine is discussed. Various methods are pre- 
sented and their relations to each other are indicated. The 
methods presented parallel those developed in the previous 
papers on “The Solution of Simultaneous Equations.” Espe- 
cially emphasized are the Abbreviated Doolittle and the Com- 
pact methods. Additional topics include the evaluation of 
partially symmetric determinants by means of symmetric 
methods and the evaluation of determinantal ratios. (Cour- 
tesy Psychometrika.) 





Guilford, J. P. “The Difficulty of a Test and Its Factor 
Composition.” Psychometrika, VI (1941), 66-77. 


A factor analysis of the 10 sub-tests of the Seashore test 
of pitch discrimination revealed that more than one ability is 
involved. One factor, which accounted for-the greater share 
of the variances, had loadings that decreased systematically 
with increasing difficulty. A second factor had strongest load- 
ings among the more difficult items, particularly those with 
frequency differences of two to five cycles per second. A third 
had strongest loadings at differences of five to 12 cycles per 
second. No explanation for the three factors is apparent, but 
the hypothesis is accepted that they represent distinct abilities. 


313 











EDUCATIONAL AND PSYCHOLOGICAL MEASUREMENT 


In tests so homogeneous as to content and form, where a single 
common factor might well have been expected, the appearance 
of additional common factors emphasizes the importance of 
considering the difficulty level of test items, both in the 
attempt to interpret new factors and in the practice of testing. 
The same kind of item may measure different abilities accord- 
ingly as it is easy or difficult for the individuals to whom it is 
applied. (Courtesy Psychometrika.) 





Guilford, J. P. “A Note on the Discovery of a G Factor by 
Means of Thurstone’s Centroid Method of Analysis.” 
Psychometrika, VI (1941), 205-8. 


A fictitious factor matrix including 16 tests and three fac- 
tors, one of which was a g factor, was prescribed. From it two 
typical factor problems, including errors of sampling, were 
derived. Students in training, without awareness of the factor 
patterns, arrived at essentially correct solutions by the use of 
Thurstone’s centroid method with rotation of axes. Errors in 
the calculated factor matrix were very close in size to the 
sampling errors in the correlation coefficients. It is concluded 
that a g factor need not escape detection by Thurstone’s pro- 
cedures if the criteria of complete simple structure are not 
demanded. (Courtesy Psychometrika.) 





Horst, Paul. “A Non-graphical Method for Transforming an 
Arbitrary Factor Matrix into a Simple Structure Factor 
Matrix.” Psychometrika, VI (1941), 79-99. 


The most commonly used method of factoring a matrix of 
intercorrelations is the centroid method developed by L. L. 
Thurstone. It is, however, necessary to transform the cen- 
troid matrix of factor loadings into a simple structure matrix 
in order to facilitate the interpretation of the factor loadings. 
Current methods for effecting this transformation are chiefly 
graphical and require considerable experience and personal 
judgment. This paper presents a new method for transform- 
ing an arbitrary factor matrix into a simple structure matrix 
by methods almost completely objective. The theory under- 


314 

















































MEASUREMENT ABSTRACTS 


lying the method is developed and approximation procedures 
are derived. The method is applied to a matrix of factor load- 
ings previously analyzed by Thurstone. (Courtesy Psycho- 


metrika. ) 


Hoyt, Cyril. “Test Reliability Estimated by Analysis of Vari- 
ance.” Psychometrika, VI (1941), 153-60. 


A formula for estimating the reliability of a cest, based on 
the analysis of variance theory, is developed and illustrated. 
The data needed for the required computation are the number 
of correct responses to each item and the score for each sub- 
ject. The results obtained from this formula are identical with 
those from one of the special cases of the Kuder-Richardson 
formulation. The relationships of the new procedure to other 
approaches to the problem are indicated. (Courtesy Psycho- 
metrika. ) 
Karlin, J. E. “The Isolation of Musical Abilities by Factorial 

Methods.” Psychometrika, V (1940), 316. (Abstract of 

a paper read at the September, 1940, meeting of the Amer- 

ican Psychological Association. ) 











Lazarsfeld, Paul F. (Guest Editor.) ‘‘Radio Research and 
Applied Psychology.” Journal of Applied Psychology, 
XXIV, No. 6 (1940), 661-853. 

This entire number is devoted to 21 articles dealing with 
problems of radio research (including two on magazine adver- 
tising research techniques), not separately abstracted here be- 
cause of space limitations. They are classified into the follow- 
ing five groups of papers: I. ‘Commercial Effects of Radio” 
(F. Stanton, E. Smith & E. Suchman, M. Fleiss, M. Erdélyi) ; 
II. “Educational and Other Effects of Radio” (S. Reid, J. R. 
Miles, G. Wiebe) ; III. ‘Program Research” (J. N. Peter- 
man, H. Schwerin, C. Daniel, H. C. Link & P. G. Corby) ; 
IV. “General Research Techniques” (E. A. Suchman & B. 
McCandless, M. Rollins, H. Gaudet & E. C. Wilson, D. B. 
Lucas, R. Franzen, P. F. Lazarsfeld) ; and V. ‘‘Measurement 
Problems” (P. F. Lazarsfeld & W. S. Robinson, C. Daniel, 
W. S. Robinson, R. Franzen). F. 4. Kingsbury. 


315 








EDUCATIONAL AND PSYCHOLOGICAL MEASUREMENT 


Lentz, Theodore F., and Whitmer, Edith F. “Item Synonymi- 
zation: A Method for Determining the Total Meaning of 
Pencil-Paper Reactions.” Psychometrika, VI (1941), 
131-9. 


Items have been studied heretofore for their value as 
elements of particular tests to the neglect of more funda- 
mental research into the multiple potentiality of items. This 
article proposes a method of grouping items into ‘‘synonymies”’ 
comprising all of the items which correlate with a given key 
item. These synonymies can be used for interpretation of the 
total meaning of the key item: (1) by inspection of the con- 
stituent items and (2) by correlational study of obtained 
single scores of individual persons. The method is illustrated 
by four items with inter- and intra-correlations, and charac- 
teristics of an ideal background reservoir of items are pointed 
out. (Courtesy Psychometrika. ) 





Martin, D. R. ‘Mental Tests in Clinical Practice.” Austral- 
asian Journal of Psychology and Philosophy, XVIII 
(1940), 144-53. 


The author discusses the purpose of mental testing in child- 
guidance work and describes the battery of tests for general 
intelligence, special abilities and disabilities, school achieve- 
ment, personality traits, and emotional stability in use at his 
clinic. References are made to the recent controversy between 
Cattell and Vernon over the value of the Binet test. 
W. A. Varvel. 


McNemar, Quinn. “On the Sampling Errors of Factor Load- 
ings.”’ Psychometrika, VI (1941), 141-52. 


The results of three empirical studies on the sampling 
fluctuation of centroid factor loadings are reported. The first 
study is based on data which happened to be available on 8 
variables for 700 cases and which were factored to three fac- 
tors for subsamples. The second study is based on fictitious 
data for 2500 cases which provided separate analyses on 25 
samples for each of three situations: five variables, one factor; 
five variables, two factors; and six variables, three factors. 


316 


















=~ oe ws 








MEASUREMENT ABSTRACTS 


The third study, based on real data for nine variables and 
7000 cases, involves separate factorization for 25 samples of 
200 cases. The three studies agree in showing that the sam- 
pling behavior of first centroid factor loadings is much like 
that of correlation coefficients, whereas the sampling fluctua- 
tions for loadings beyond the first are disturbingly large. 
(Courtesy Psychometrika). 





McNemar, Quinn. “More on the Iowa I.Q. Studies.”’ Journal 
of Psychology, X (1940), 237-40. 


In a reply to the Wellman-Skeels-Skodak review[ Psychol- 
ogical Bulletin, XXXVII (1940), 93-111] of his original 
critique of the Iowa studies on environmentally-determined 
changes in I.Q., the author does not find it necessary to modify 
materially his previous criticisms. W. 4. Varvel. 





McNemar, Quinn. ‘On the Number of Factors.” Psycho- 
metrika, V (1940), 315. (Abstract of a paper read at 
the September, 1940, meeting of the American Psycho- 
logical Association. ) 





Porter, E. K. “Criteria of a Good Examination.” Public 


Health Nursing, XXXII (1940), 558-64. 


Steps in the construction of a test are outlined. Principles 
for the construction of tests and test items are presented. 





Schaefer, Willis C. ‘The Relation of Test Difficulty and Fac- 
torial Composition Determined from Individual and Group 
Forms of Primary Mental Abilities Tests.’ Psychomet- 
rika, V (1940), 316-17. (Abstract of a paper read at the 
September, 1940, meeting of the American Psychological 
Association. ) 





Van Steenberg, N. J. ‘Analysis of Mental Growth of School 
Children.” Psychometrika, V (1940), 314. (Abstract of 
a paper read at the September, 1940, meeting of the 
American Psychological Association. ) 


317 








MEASUREMENT NEWS* 


Dr. John C. Flanagan has been granted a year’s 
leave of absence from the Cooperative Test Service in 
order that he may accept a commission as a reserve officer 
in the Army Air Corps. Dr. Flanagan will direct de- 
velopmental researches and make practical applications 
with regard to problems of selection of Air Corps 
personnel. 





The authors of the Chicago Reading Tests, Drs. Max 
D. Engelhart and Thelma Gwinn Thurstone, are con- 
ducting an investigation of the comparability of the 
norms of these tests from form to form and also their 
comparability with the norms of the Metropolitan and 
Stanford Reading Tests. Each series of forms of the 
Chicago tests was standardized independently in succes- 
sive years by administration to pupils in a representative 
sample of 30 Chicago elementary schools. Approxi- 
mately 8,000 elementary pupils took each form when it 
was given for standardization. In addition, two forms of 
the sixth-, seventh-, and eighth-grade test were adminis- 
tered in successive years to approximately 8,000 Chicago 
high-school pupils. The assumption which was made 
and is now being tested was that norms based on large 





*Notes for this department should be sent to Dr. M. W. Richardson, United 
States Civil Service Commission, Washington, D. C 


318 








— —“ og 


pe oO 


tl 





ae ee a oe 











EDUCATIONAL AND PSYCHOLOGICAL MEASUREMENT 


samples of pupils drawn from the same schools should be 
comparable. In the current study each of the three forms 
of each of the four Chicago tests, and the appropriate 
Metropolitan and Stanford tests, were administered to 
several hundred pupils in randomized order to control 
practice effect. For example, the three forms of Chicago 
Reading Test D and the Metropolitan and Stanford Ad- 
vanced Reading Tests were administered to the same ele- 
mentary pupils. It is planned to determine the equiva- 
lence of the raw scores and, on the basis of these data, 
to make any necessary adjustments to secure precise com- 
parability of the norms. 


% 





Professor Karl J. Holzinger of the University of Chi- 
cago has written a treatise on Factor Analysis with the 
assistance of Harry H. Harman, Research Associate. This 
volume is being published by the University of Chicago 
Press and will be ready this fall. Professor Holzinger 
is also joint author of two new monographs on the appli- 
cation of factorial methods. The collaborating authors 
are M. A. Wenger, Frances Swineford and Harry H. 
Harman. These monographs will also be published by 
the University of Chicago Press. 





Wright Junior College in Chicago has recently begun 
a three-year study on the evaluation of terminal educa- 
tion. The study is a part of a comprehensive investiga- 
tion of junior college terminal education being carried 
on in nine selected junior colleges by the American Asso- 
ciation of Junior Colleges with a grant made by the 
General Education Board. 


The Wright study will attempt to evaluate the pres- 
ent terminal general and terminal occupational programs 


319 


MEASUREMENT NEWS 


offered at that institution. One of the purposes of the 
study will be the development of techniques of evaluation | 
for use by other schools. 


An extensive measurement program will be initiated 
in September for the incoming freshman class. Measure- 
ments will be made in twelve areas: effective thinking, | 
command of skills and understandings in the major cul- 
tural areas, functional understanding of the basic facts of 7 
health and disease, interests, appreciations, consumer | 
competence, occupational efficiency, personal-socio adapt- | 
ability, socio-civic consciousness, attitudes, worthy use of © 
leisure, functional philosophy of life. Several measuring — 
devices now available will be used as well as others which | 
are now being developed as a part of the study. Those 
in the former category include the Cooperative Test Serv- | 
ice General Culture and Contemporary Affairs tests, the | 


Kuder Preference Record, and two of the tests of the 
Progressive Education Association on interpretation of | 
data and nature of proof. 


The study is being conducted by a group composed of 
Dean William H. Conley, Bernard Gold, Alice Griffin, | 
Max D. Engelhart, and Leland Medsker. 





Soon to be published by the Social Science Research 
Council is a monograph on the prediction of personal’ 
adjustment. The text has been prepared by Dr. Paul 
Horst of Procter and Gamble. The monograph will deal} 
with personal adjustment in connection with vocations, 
schools, marriage, and criminal recidivism. 











